Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] capabilites, take 2

5 views
Skip to first unread message

Andy Lutomirski

unread,
May 13, 2004, 4:20:08 PM5/13/04
to
This implements working capabilities.

Changes from the previous version:

- It's now just one patch (the split doesn't make sense anymore)
- Cleaned up cap_bprm_apply_creds
- Restrictions on setuid are somewhat stricter now
- Rediffed against 2.6.6-mm2 (didn't change anything)
- bprm is initialized correctly in fs/exec.c
- printk to remind users that the patch is enabled
- LSM builds correctly. Untested.

The whole thing is now a module parameter -- specify
commoncap.newcaps=1 if capabilities is built-in or newcaps=1 in
modprobe or insmod if not.

Known issues:

1. setxuid emulation is wrong (keepcaps doesn't work for setre(s)uid).
I haven't fixed this because I don't what to change the semantics
of PR_SET_KEEPCAPS. I'll probably add PR_SET_REALLY_KEEPCAPS
to really keep capabilities.

2. When newcaps=0 (the default), linuxrc gets all capabilities. This
is different from the old behavior. Init and programs started from
linuxrc still match the old behavior. I couldn't think of any
remotely clean way around that without breaking linuxrc for
newcaps=1

3. I haven't tried it, but I imagine that unloading commoncap and then
reloading it in a different mode may do unexpected things. So read
the code before you try that. I see no reason to fix this one because
the whole thing should go away eventually.

When newcaps=1, this extends the LSM interface:
bprm->secflags & BINPRM_SEC_NO_ELEVATE means that privileges should not
be elevated. So stacking modules (e.g. selinux) should set this before
calling to the lower module and test it again before deciding whether
to elevate privileges. That way, either all privs are elevated or none
are.

I also added a flag (BINPRM_SEC_SECUREEXEC) with the obvious meaning.
Otherwise cap_bprm_secureexec would have been a mess.

--Andy


Implement optional working capability support. Try to avoid giving Andrew
a heart attack. ;)

fs/exec.c | 16 ++++
include/linux/binfmts.h | 8 ++
include/linux/capability.h | 6 +
include/linux/init_task.h | 4 -
kernel/capability.c | 2
security/commoncap.c | 155 +++++++++++++++++++++++++++++++++++++++++----
6 files changed, 172 insertions(+), 19 deletions(-)

--- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/fs/exec.c 2004-05-13 12:15:20.000000000 -0700
@@ -882,8 +882,10 @@

if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ bprm->secflags |= BINPRM_SEC_SETUID;
+ }

/* Set-gid? */
/*
@@ -891,10 +893,19 @@
* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ bprm->secflags |= BINPRM_SEC_SETGID;
+ }
}

+ /* Pretend we have VFS capabilities */
+ cap_set_full(bprm->cap_inheritable);
+ if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+ cap_set_full(bprm->cap_permitted);
+ else
+ cap_clear(bprm->cap_permitted);
+
/* fill in binprm security blob */
retval = security_bprm_set(bprm);
if (retval)
@@ -1089,6 +1100,7 @@
bprm.loader = 0;
bprm.exec = 0;
bprm.security = NULL;
+ bprm.secflags = 0;
bprm.mm = mm_alloc();
retval = -ENOMEM;
if (!bprm.mm)
--- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/security/commoncap.c 2004-05-13 12:59:32.934690092 -0700
@@ -24,6 +24,11 @@
#include <linux/xattr.h>
#include <linux/hugetlb.h>

+int newcaps = 0;
+
+module_param(newcaps, int, 444);
+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
+
int cap_capable (struct task_struct *tsk, int cap)
{
/* Derived from include/linux/sched.h:capable. */
@@ -36,6 +41,11 @@
int cap_ptrace (struct task_struct *parent, struct task_struct *child)
{
/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
+ /* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
+ if (newcaps &&
+ !cap_issubset (child->cap_inheritable, current->cap_inheritable))
+ return -EPERM;
+
if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
!capable (CAP_SYS_PTRACE))
return -EPERM;
@@ -76,6 +86,11 @@
return -EPERM;
}

+ /* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
+ if (newcaps && !cap_issubset (*permitted, *inheritable)) {
+ return -EPERM;
+ }
+
return 0;
}

@@ -89,6 +104,8 @@

int cap_bprm_set_security (struct linux_binprm *bprm)
{
+ if (newcaps) return 0;
+
/* Copied from fs/exec.c:prepare_binprm. */

/* We don't have VFS support for capabilities yet */
@@ -115,10 +132,11 @@
return 0;
}

-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
{
- /* Derived from fs/exec.c:compute_creds. */
+ /* This function will hopefully die in 2.7. */
kernel_cap_t new_permitted, working;
+ static int fixed_init = 0;

new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
working = cap_intersect (bprm->cap_inheritable,
@@ -151,6 +169,15 @@
current->cap_permitted = new_permitted;
current->cap_effective =
cap_intersect (new_permitted, bprm->cap_effective);
+ } else if (!fixed_init) {
+ /* This is not strictly correct, as it gives linuxrc more
+ * permissions than it used to have. It was the only way I
+ * could think of to keep the resulting disaster contained,
+ * though.
+ */
+ current->cap_effective = CAP_OLD_INIT_EFF_SET;
+ current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+ fixed_init = 1;
}

/* AUD: Audit candidate if current->cap_effective is set */
@@ -158,15 +185,103 @@
current->keep_capabilities = 0;
}

+/*
+ * The rules of Linux capabilities (not POSIX!)
+ *
+ * What the masks mean:
+ * pI = capabilities that this process or its children may have
+ * pP = capabilities that this process has
+ * pE = capabilities that this process has and are enabled
+ * (so pE <= pP <= pI)
+ *
+ * The capability evolution rules are:
+ *
+ * pI' = pI & fI
+ * pP' = ((fP & cap_bset) | pP) & pI' & Y
+ * pE' = (setuid ? pP' : (pE & pP'))
+ *
+ * X = cap_bset
+ * Y is zero if uid!=0, euid==0, and setuid non-root
+ *
+ * Caveat: if (fP & ~pI'), there is no _theoretical_ problem, but
+ * this could introduce exploits in buggy programs. Since programs
+ * that aren't capability-aware are insecure _anyway_ if pP!=0, this
+ * is OK.
+ *
+ * To allow pI != ~0 to be secure in the presence of buggy programs,
+ * we require full pI for setuid.
+ *
+ * The moral is that, if file capabilities are introduced, programs
+ * that are granted fP > 0 need to be aware of how to deal with it.
+ * Because the effective set is left alone on non-setuid fP>0,
+ * such a program should drop capabilities that were not in its initial
+ * effective set before running untrusted code.
+ *
+ */
+void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+{
+ kernel_cap_t new_pI, new_pP;
+ kernel_cap_t fI, fP;
+ int is_setuid, is_setgid;
+
+ if(!newcaps) {
+ cap_bprm_apply_creds_compat(bprm, unsafe);
+ return;
+ }
+
+ fI = bprm->cap_inheritable;
+ fP = bprm->cap_permitted;
+ is_setuid = (bprm->secflags & BINPRM_SEC_SETUID);
+ is_setgid = (bprm->secflags & BINPRM_SEC_SETGID);
+
+ new_pI = cap_intersect(current->cap_inheritable, fI);
+
+ /* Check that it's safe to elevate privileges */
+ if (unsafe & ~LSM_UNSAFE_PTRACE_CAP)
+ bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+ /* FIXME: Is this overly harsh on setgid? */
+ if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
+ new_pI != CAP_FULL_SET)
+ bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+ if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
+ is_setuid = is_setgid = 0;
+ cap_clear(fP);
+ }
+
+ new_pP = cap_intersect(fP, cap_bset);
+ new_pP = cap_combine(new_pP, current->cap_permitted);
+ cap_mask(new_pP, new_pI);
+
+ /* setuid-nonroot is special. */
+ if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
+ current->euid == 0)
+ cap_clear(new_pP);
+
+ if(!cap_issubset(new_pP, current->cap_permitted))
+ bprm->secflags |= BINPRM_SEC_SECUREEXEC;
+
+ /* Apply new security state */
+ if (is_setuid) {
+ current->suid = current->euid = current->fsuid = bprm->e_uid;
+ current->cap_effective = new_pP;
+ }
+ if (is_setgid)
+ current->sgid = current->egid = current->fsgid = bprm->e_gid;
+
+ current->cap_inheritable = new_pI;
+ current->cap_permitted = new_pP;
+ cap_mask(current->cap_effective, new_pP);
+
+ current->keep_capabilities = 0;
+}
+
int cap_bprm_secureexec (struct linux_binprm *bprm)
{
- /* If/when this module is enhanced to incorporate capability
- bits on files, the test below should be extended to also perform a
- test between the old and new capability sets. For now,
- it simply preserves the legacy decision algorithm used by
- the old userland. */
return (current->euid != current->uid ||
- current->egid != current->gid);
+ current->egid != current->gid ||
+ (bprm->secflags & BINPRM_SEC_SECUREEXEC));
}

int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
@@ -280,9 +395,15 @@

void cap_task_reparent_to_init (struct task_struct *p)
{
- p->cap_effective = CAP_INIT_EFF_SET;
- p->cap_inheritable = CAP_INIT_INH_SET;
- p->cap_permitted = CAP_FULL_SET;
+ if (newcaps) {
+ cap_set_full(p->cap_inheritable);
+ cap_set_full(p->cap_permitted);
+ cap_set_full(p->cap_effective);
+ } else {
+ p->cap_effective = CAP_OLD_INIT_EFF_SET;
+ p->cap_inheritable = CAP_OLD_INIT_INH_SET;
+ p->cap_permitted = CAP_FULL_SET;
+ }
p->keep_capabilities = 0;
return;
}
@@ -367,6 +488,16 @@
return -ENOMEM;
}

+static int __init commoncap_init (void)
+{
+ if (newcaps) {
+ printk(KERN_NOTICE "Experimental capability support is on\n");
+ cap_bset = CAP_FULL_SET;
+ }
+
+ return 0;
+}
+
EXPORT_SYMBOL(cap_capable);
EXPORT_SYMBOL(cap_ptrace);
EXPORT_SYMBOL(cap_capget);
@@ -382,5 +513,7 @@
EXPORT_SYMBOL(cap_syslog);
EXPORT_SYMBOL(cap_vm_enough_memory);

+module_init(commoncap_init);
+
MODULE_DESCRIPTION("Standard Linux Common Capabilities Security Module");
MODULE_LICENSE("GPL");
--- linux-2.6.6-mm2/kernel/capability.c~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/kernel/capability.c 2004-05-13 11:42:51.000000000 -0700
@@ -13,7 +13,7 @@
#include <asm/uaccess.h>

unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */
-kernel_cap_t cap_bset = CAP_INIT_EFF_SET;
+kernel_cap_t cap_bset = CAP_OLD_INIT_EFF_SET;
int sysctl_mlock_group;

EXPORT_SYMBOL(securebits);
--- linux-2.6.6-mm2/include/linux/capability.h~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/capability.h 2004-05-13 11:42:51.000000000 -0700
@@ -308,8 +308,10 @@

#define CAP_EMPTY_SET to_cap_t(0)
#define CAP_FULL_SET to_cap_t(~0)
-#define CAP_INIT_EFF_SET to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
-#define CAP_INIT_INH_SET to_cap_t(0)
+
+/* For old-style capabilities, we use these. */
+#define CAP_OLD_INIT_EFF_SET to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
+#define CAP_OLD_INIT_INH_SET to_cap_t(0)

#define CAP_TO_MASK(x) (1 << (x))
#define cap_raise(c, flag) (cap_t(c) |= CAP_TO_MASK(flag))
--- linux-2.6.6-mm2/include/linux/binfmts.h~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/binfmts.h 2004-05-13 11:44:02.000000000 -0700
@@ -20,6 +20,10 @@
/*
* This structure is used to hold the arguments that are used when loading binaries.
*/
+#define BINPRM_SEC_SETUID 1
+#define BINPRM_SEC_SETGID 2
+#define BINPRM_SEC_SECUREEXEC 4
+#define BINPRM_SEC_NOELEVATE 8
struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
struct page *page[MAX_ARG_PAGES];
@@ -28,7 +32,9 @@
int sh_bang;
struct file * file;
int e_uid, e_gid;
- kernel_cap_t cap_inheritable, cap_permitted, cap_effective;
+ int secflags;
+ kernel_cap_t cap_inheritable, cap_permitted;
+ kernel_cap_t cap_effective; /* old caps -- do NOT use in new code */
void *security;
int argc, envc;
char * filename; /* Name of binary as seen by procps */
--- linux-2.6.6-mm2/include/linux/init_task.h~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/init_task.h 2004-05-13 11:42:51.000000000 -0700
@@ -92,8 +92,8 @@
.function = it_real_fn \
}, \
.group_info = &init_groups, \
- .cap_effective = CAP_INIT_EFF_SET, \
- .cap_inheritable = CAP_INIT_INH_SET, \
+ .cap_effective = CAP_FULL_SET, \
+ .cap_inheritable = CAP_FULL_SET, \
.cap_permitted = CAP_FULL_SET, \
.keep_capabilities = 0, \
.rlim = INIT_RLIMITS, \

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Chris Wright

unread,
May 13, 2004, 9:30:08 PM5/13/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> Implement optional working capability support. Try to avoid giving Andrew
> a heart attack. ;)

I think it still needs more work. Default behavoiur is changed, like
Inheritble is full rather than clear, setpcap is enabled, etc. Also,
why do you change from Posix the way exec() updates capabilities? Sure,
there is no filesystem bits present, so this changes the calculation,
but I'm not convinced it's as secure this way. At least with newcaps=0.

I believe we can get something functional with fewer changes, hence
easier to understand the ramifications. In a nutshell, I'm still not
comfortable with this.

Also, it breaks my tests which try to drop privs and keep caps across
execve() which is really the only issue we're trying to solve ATM.


> --- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/fs/exec.c 2004-05-13 12:15:20.000000000 -0700
> @@ -882,8 +882,10 @@
>
> if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
> /* Set-uid? */
> - if (mode & S_ISUID)
> + if (mode & S_ISUID) {
> bprm->e_uid = inode->i_uid;
> + bprm->secflags |= BINPRM_SEC_SETUID;
> + }
>
> /* Set-gid? */
> /*
> @@ -891,10 +893,19 @@
> * is a candidate for mandatory locking, not a setgid
> * executable.
> */
> - if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
> + if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
> bprm->e_gid = inode->i_gid;
> + bprm->secflags |= BINPRM_SEC_SETGID;
> + }
> }
>
> + /* Pretend we have VFS capabilities */
> + cap_set_full(bprm->cap_inheritable);

This looks sketchy.

> + if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)

CodingStyle: add space after keyword 'if'
if ((...))

> + cap_set_full(bprm->cap_permitted);
> + else
> + cap_clear(bprm->cap_permitted);
> +
> /* fill in binprm security blob */
> retval = security_bprm_set(bprm);
> if (retval)
> @@ -1089,6 +1100,7 @@
> bprm.loader = 0;
> bprm.exec = 0;
> bprm.security = NULL;
> + bprm.secflags = 0;
> bprm.mm = mm_alloc();
> retval = -ENOMEM;
> if (!bprm.mm)
> --- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700
> +++ linux-2.6.6-mm2/security/commoncap.c 2004-05-13 12:59:32.934690092 -0700
> @@ -24,6 +24,11 @@
> #include <linux/xattr.h>
> #include <linux/hugetlb.h>
>
> +int newcaps = 0;

make this:
static int newcaps;

> +
> +module_param(newcaps, int, 444);
> +MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
> +
> int cap_capable (struct task_struct *tsk, int cap)
> {
> /* Derived from include/linux/sched.h:capable. */
> @@ -36,6 +41,11 @@
> int cap_ptrace (struct task_struct *parent, struct task_struct *child)
> {
> /* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
> + /* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
> + if (newcaps &&
> + !cap_issubset (child->cap_inheritable, current->cap_inheritable))
> + return -EPERM;

Why no capable() override? In fact, is this check really necessary?

> +
> if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
> !capable (CAP_SYS_PTRACE))
> return -EPERM;
> @@ -76,6 +86,11 @@
> return -EPERM;
> }
>
> + /* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
> + if (newcaps && !cap_issubset (*permitted, *inheritable)) {
> + return -EPERM;
> + }
> +
> return 0;
> }
>
> @@ -89,6 +104,8 @@
>
> int cap_bprm_set_security (struct linux_binprm *bprm)
> {
> + if (newcaps) return 0;

CodingStyle:
if (newcaps)
return 0;

Hrm...

This was made unconditional. And how are you convinced it's safe?

> .cap_permitted = CAP_FULL_SET, \
> .keep_capabilities = 0, \
> .rlim = INIT_RLIMITS, \


thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Valdis.K...@vt.edu

unread,
May 13, 2004, 9:40:06 PM5/13/04
to
On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:

> I think it still needs more work. Default behavoiur is changed, like
> Inheritble is full rather than clear, setpcap is enabled, etc. Also,
> why do you change from Posix the way exec() updates capabilities? Sure,
> there is no filesystem bits present, so this changes the calculation,
> but I'm not convinced it's as secure this way. At least with newcaps=0.

The last time the "capabilities" thread reared its head a while ago, Andy made
a posting that pretty conclusively showed that the Posix way was totally b0rken
if you ever intended to support filesystem bits. So if you wanted to ever have
a snowball's chance of supporting something like:

chcap cap_net_raw+ep /bin/ping

so you could get rid of the set-UID bit on 'ping', you had to toss the Posix
propogation rules out the window. So we need to do either:

1) Toss the Posix out the window
2) Toss all the filesystems capabilities support out the window.

(I'm assuming that a suggestion that we make the choice a Kconfig option will
be met with the sound of many kernel hackers either retching in disgust or
screaming in horror ;)

Andy Lutomirski

unread,
May 13, 2004, 10:40:05 PM5/13/04
to
Chris Wright wrote:

> * Andy Lutomirski (lu...@myrealbox.com) wrote:
>
>>Implement optional working capability support. Try to avoid giving Andrew
>>a heart attack. ;)
>
>
> I think it still needs more work. Default behavoiur is changed, like

> Inheritble is full rather than clear, setpcap is enabled, etc. ...

In cap_bprm_apply_creds_compat:

+ } else if (!fixed_init) {
+ /* This is not strictly correct, as it gives linuxrc more
+ * permissions than it used to have. It was the only way I
+ * could think of to keep the resulting disaster contained,
+ * though.
+ */
+ current->cap_effective = CAP_OLD_INIT_EFF_SET;
+ current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+ fixed_init = 1;

So that it gets changed back. Otherwise linuxrc ran without permissions
and my drives never got mounted. Yah, it's ugly -- I'm open to
suggestions to avoid this.

> ... Also,


> why do you change from Posix the way exec() updates capabilities? Sure,
> there is no filesystem bits present, so this changes the calculation,
> but I'm not convinced it's as secure this way. At least with newcaps=0.

I'm not convinced that Posix's version makes any sense. Also, there are
apparently a number of drafts around which disagree on what the right
rules are. (My copy, for example, matches the old rules exactly, but
the old rules caused the sendmail problem.) And, under Posix, what does
the inheritable mask mean, anyway?

Also, I don't find the posix rules to be useful (why is there an
inheritable mask if all it does is to cause caps to be dropped on
exec, when the user could just manually drop them?).

>
> I believe we can get something functional with fewer changes, hence
> easier to understand the ramifications. In a nutshell, I'm still not
> comfortable with this.

I'll play with it, but I think this is the shortest patch I've come up
with. I'll admit that touching this stuff scares me too, but I'd rather
redo it that try and patch it over again.

>
> Also, it breaks my tests which try to drop privs and keep caps across
> execve() which is really the only issue we're trying to solve ATM.

Can you send me a sample of what breaks? I do:

[root@luto tmp]# cap -c = ls /home/andy
ls: /home/andy: Permission denied
[root@luto tmp]# echo test >foo
[root@luto tmp]# chmod 700 foo
[root@luto tmp]# su andy -c 'cat foo'
cat: foo: Permission denied
[root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy cat foo
test
[root@luto tmp]# cap -c '= cap_dac_read_search=eip' -u andy bash
[andy@luto tmp]$ whoami
andy
[andy@luto tmp]$ dumpcap
Real Eff
User 500 500
Group 500 500

Caps: = cap_dac_read_search+eip
[andy@luto tmp]$ cat foo
test

Which looks exactly right to me.
(cap and dumpcap live at www.stanford.edu/~luto/cap/)

>>--- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/fs/exec.c 2004-05-13 12:15:20.000000000 -0700
>>@@ -882,8 +882,10 @@
>>
>> if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
>> /* Set-uid? */
>>- if (mode & S_ISUID)
>>+ if (mode & S_ISUID) {
>> bprm->e_uid = inode->i_uid;
>>+ bprm->secflags |= BINPRM_SEC_SETUID;
>>+ }
>>
>> /* Set-gid? */
>> /*
>>@@ -891,10 +893,19 @@
>> * is a candidate for mandatory locking, not a setgid
>> * executable.
>> */
>>- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
>>+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>> bprm->e_gid = inode->i_gid;
>>+ bprm->secflags |= BINPRM_SEC_SETGID;
>>+ }
>> }
>>
>>+ /* Pretend we have VFS capabilities */
>>+ cap_set_full(bprm->cap_inheritable);
>
>
> This looks sketchy.

My concept of 'inheritable' is that caps that are _not_ inheritable
may never be gained by this task or its children. So a process
should normally have all caps inheritable.

>
>
>>+ if((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
>
>
> CodingStyle: add space after keyword 'if'

Fixed.

> if ((...))
>
>
>>+ cap_set_full(bprm->cap_permitted);
>>+ else
>>+ cap_clear(bprm->cap_permitted);
>>+
>> /* fill in binprm security blob */
>> retval = security_bprm_set(bprm);
>> if (retval)
>>@@ -1089,6 +1100,7 @@
>> bprm.loader = 0;
>> bprm.exec = 0;
>> bprm.security = NULL;
>>+ bprm.secflags = 0;
>> bprm.mm = mm_alloc();
>> retval = -ENOMEM;
>> if (!bprm.mm)
>>--- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/security/commoncap.c 2004-05-13 12:59:32.934690092 -0700
>>@@ -24,6 +24,11 @@
>> #include <linux/xattr.h>
>> #include <linux/hugetlb.h>
>>
>>+int newcaps = 0;
>
>
> make this:
> static int newcaps;

Fixed.

>
>
>>+
>>+module_param(newcaps, int, 444);
>>+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
>>+
>> int cap_capable (struct task_struct *tsk, int cap)
>> {
>> /* Derived from include/linux/sched.h:capable. */
>>@@ -36,6 +41,11 @@
>> int cap_ptrace (struct task_struct *parent, struct task_struct *child)
>> {
>> /* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
>>+ /* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
>>+ if (newcaps &&
>>+ !cap_issubset (child->cap_inheritable, current->cap_inheritable))
>>+ return -EPERM;
>
>
> Why no capable() override? In fact, is this check really necessary?

If task A has less inheritable caps than B, then A is somehow less trusted
and has no business tracing B.

A concrete example: a system runs with very restricted inheritable caps
on all processes except for a magic daemon. The magic daemon holds on
to CAP_SYS_ADMIN to umount everything at shutdown. If the rest of the
system gets rooted, it still shouldn't be possible to trace the daemon.
(Yes, this is currently not workable -- I plan to add a sysctl that sets
what inheritable caps a task must have for setuid to work. The blanket
requirement that _all_ must be present is to avoid bugs in which a
setuid program assumes it will be fully privileged.)

>
>
>>+
>> if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
>> !capable (CAP_SYS_PTRACE))
>> return -EPERM;
>>@@ -76,6 +86,11 @@
>> return -EPERM;
>> }
>>
>>+ /* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
>>+ if (newcaps && !cap_issubset (*permitted, *inheritable)) {
>>+ return -EPERM;
>>+ }
>>+
>> return 0;
>> }
>>
>>@@ -89,6 +104,8 @@
>>
>> int cap_bprm_set_security (struct linux_binprm *bprm)
>> {
>>+ if (newcaps) return 0;
>
>
> CodingStyle:
> if (newcaps)
> return 0;

Fixed.

Yup. It sucks. I didn't want to touch the system startup code, though,
and this is the closest LSM comes. It's too bad there's no LSM hook to do
this right (although I suppose I could add one, but that would break
capabilities as a module).

Speaking of which, this is a genuine problem if commoncap is a module
and newcaps=0 -- this code will probably never run. Does it matter?


[lots of patch snipped]

>>--- linux-2.6.6-mm2/include/linux/init_task.h~caps 2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/include/linux/init_task.h 2004-05-13 11:42:51.000000000 -0700
>>@@ -92,8 +92,8 @@
>> .function = it_real_fn \
>> }, \
>> .group_info = &init_groups, \
>>- .cap_effective = CAP_INIT_EFF_SET, \
>>- .cap_inheritable = CAP_INIT_INH_SET, \
>>+ .cap_effective = CAP_FULL_SET, \
>>+ .cap_inheritable = CAP_FULL_SET, \
>
>
> This was made unconditional. And how are you convinced it's safe?

Same as above. But even if it were really unconditional (instead
of just sort-of unconditional), I still think it's safe, because
(in newcaps=0 mode) only root can get CAP_SETPCAP. Root
could always just insert a module to enable it. So granting it
costs nothing.

On the other hand, safety is good, and I can't see any way in the
old system for anything to get CAP_SETPCAP. I'll send in a new
patch that just disables CAP_SETPCAP when newcaps=0.


>
>
>> .cap_permitted = CAP_FULL_SET, \
>> .keep_capabilities = 0, \
>> .rlim = INIT_RLIMITS, \
>
>
>
> thanks,
> -chris


Thanks,
Andy

Andy Lutomirski

unread,
May 13, 2004, 10:50:06 PM5/13/04
to
Changes:

- Fixed a couple CodingStyle things (thanks, Chris)
- newcaps=0 explicity bans CAP_SETPCAP.

The latter is to prevent any possibility of capset abuse
in compatibility mode.

I still haven't tried to get CAP_SETPCAP with commoncap as a module,
but it seems match the old behavior (cat /proc/1/status) when built-in
(modulo linuxrc).

I suppose that if people really are scared of CAP_SETPCAP, then this
is a good precaution, because linuxrc's children will have it with
my patch.

I'm still open to suggestions for the real right fix. I guess I
could make it a real boot parameter instead of a module parameter,
but that's ugly.

As for Posix caps, is there any good reason to follow Posix? I
don't know of any OS that has Posix caps except Linux, and they're
broken. The spec was dropped, anyway.

--Andy


fs/exec.c | 16 +++-
include/linux/binfmts.h | 8 +-


include/linux/capability.h | 6 +
include/linux/init_task.h | 4 -
kernel/capability.c | 2

security/commoncap.c | 159 +++++++++++++++++++++++++++++++++++++++++----
6 files changed, 176 insertions(+), 19 deletions(-)


--- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700

+++ linux-2.6.6-mm2/fs/exec.c 2004-05-13 18:46:36.000000000 -0700


@@ -882,8 +882,10 @@

if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ bprm->secflags |= BINPRM_SEC_SETUID;
+ }

/* Set-gid? */
/*
@@ -891,10 +893,19 @@
* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ bprm->secflags |= BINPRM_SEC_SETGID;
+ }
}

+ /* Pretend we have VFS capabilities */
+ cap_set_full(bprm->cap_inheritable);

+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)


+ cap_set_full(bprm->cap_permitted);
+ else
+ cap_clear(bprm->cap_permitted);
+
/* fill in binprm security blob */
retval = security_bprm_set(bprm);
if (retval)
@@ -1089,6 +1100,7 @@
bprm.loader = 0;
bprm.exec = 0;
bprm.security = NULL;
+ bprm.secflags = 0;
bprm.mm = mm_alloc();
retval = -ENOMEM;
if (!bprm.mm)
--- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700

+++ linux-2.6.6-mm2/security/commoncap.c 2004-05-13 19:28:27.000000000 -0700
@@ -24,9 +24,17 @@
#include <linux/xattr.h>
#include <linux/hugetlb.h>

+static int newcaps = 0;


+
+module_param(newcaps, int, 444);
+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
+
int cap_capable (struct task_struct *tsk, int cap)
{
/* Derived from include/linux/sched.h:capable. */

+ if (unlikely(!newcaps && cap == CAP_SETPCAP))
+ return -EPERM;
+
if (cap_raised (tsk->cap_effective, cap))
return 0;
else
@@ -36,6 +44,11 @@


int cap_ptrace (struct task_struct *parent, struct task_struct *child)
{
/* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
+ /* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
+ if (newcaps &&
+ !cap_issubset (child->cap_inheritable, current->cap_inheritable))
+ return -EPERM;

+
if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
!capable (CAP_SYS_PTRACE))
return -EPERM;

@@ -76,6 +89,11 @@


return -EPERM;
}

+ /* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
+ if (newcaps && !cap_issubset (*permitted, *inheritable)) {
+ return -EPERM;
+ }
+
return 0;
}

@@ -89,6 +107,9 @@



int cap_bprm_set_security (struct linux_binprm *bprm)
{
+ if (newcaps)

+ return 0;


+
/* Copied from fs/exec.c:prepare_binprm. */

/* We don't have VFS support for capabilities yet */

@@ -115,10 +136,11 @@


return 0;
}

-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
{
- /* Derived from fs/exec.c:compute_creds. */
+ /* This function will hopefully die in 2.7. */
kernel_cap_t new_permitted, working;
+ static int fixed_init = 0;

new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
working = cap_intersect (bprm->cap_inheritable,

@@ -151,6 +173,15 @@


current->cap_permitted = new_permitted;
current->cap_effective =
cap_intersect (new_permitted, bprm->cap_effective);
+ } else if (!fixed_init) {
+ /* This is not strictly correct, as it gives linuxrc more
+ * permissions than it used to have. It was the only way I
+ * could think of to keep the resulting disaster contained,
+ * though.
+ */
+ current->cap_effective = CAP_OLD_INIT_EFF_SET;
+ current->cap_inheritable = CAP_OLD_INIT_INH_SET;
+ fixed_init = 1;
}

/* AUD: Audit candidate if current->cap_effective is set */

@@ -158,15 +189,103 @@

+ if (!cap_issubset(new_pP, current->cap_permitted))

@@ -280,9 +399,15 @@



void cap_task_reparent_to_init (struct task_struct *p)
{
- p->cap_effective = CAP_INIT_EFF_SET;
- p->cap_inheritable = CAP_INIT_INH_SET;
- p->cap_permitted = CAP_FULL_SET;
+ if (newcaps) {
+ cap_set_full(p->cap_inheritable);
+ cap_set_full(p->cap_permitted);
+ cap_set_full(p->cap_effective);
+ } else {
+ p->cap_effective = CAP_OLD_INIT_EFF_SET;
+ p->cap_inheritable = CAP_OLD_INIT_INH_SET;
+ p->cap_permitted = CAP_FULL_SET;
+ }
p->keep_capabilities = 0;
return;
}

@@ -367,6 +492,16 @@


return -ENOMEM;
}

+static int __init commoncap_init (void)
+{
+ if (newcaps) {
+ printk(KERN_NOTICE "Experimental capability support is on\n");
+ cap_bset = CAP_FULL_SET;
+ }
+
+ return 0;
+}
+
EXPORT_SYMBOL(cap_capable);
EXPORT_SYMBOL(cap_ptrace);
EXPORT_SYMBOL(cap_capget);

@@ -382,5 +517,7 @@

.cap_permitted = CAP_FULL_SET, \
.keep_capabilities = 0, \
.rlim = INIT_RLIMITS, \

-

Chris Wright

unread,
May 14, 2004, 1:00:08 AM5/14/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> Chris Wright wrote:
>
> > * Andy Lutomirski (lu...@myrealbox.com) wrote:
> >
> >>Implement optional working capability support. Try to avoid giving Andrew
> >>a heart attack. ;)
> >
> > I think it still needs more work. Default behavoiur is changed, like
> > Inheritble is full rather than clear, setpcap is enabled, etc. ...
>
> In cap_bprm_apply_creds_compat:
>
> + } else if (!fixed_init) {
> + /* This is not strictly correct, as it gives linuxrc more
> + * permissions than it used to have. It was the only way I
> + * could think of to keep the resulting disaster contained,
> + * though.
> + */
> + current->cap_effective = CAP_OLD_INIT_EFF_SET;
> + current->cap_inheritable = CAP_OLD_INIT_INH_SET;
> + fixed_init = 1;
>
> So that it gets changed back. Otherwise linuxrc ran without permissions
> and my drives never got mounted. Yah, it's ugly -- I'm open to
> suggestions to avoid this.

I tested as a module, and this doesn't run AFAICS

> > ... Also,
> > why do you change from Posix the way exec() updates capabilities? Sure,
> > there is no filesystem bits present, so this changes the calculation,
> > but I'm not convinced it's as secure this way. At least with newcaps=0.
>
> I'm not convinced that Posix's version makes any sense. Also, there are
> apparently a number of drafts around which disagree on what the right
> rules are. (My copy, for example, matches the old rules exactly, but
> the old rules caused the sendmail problem.) And, under Posix, what does
> the inheritable mask mean, anyway?
>
> Also, I don't find the posix rules to be useful (why is there an
> inheritable mask if all it does is to cause caps to be dropped on
> exec, when the user could just manually drop them?).

Not sure if it's defensible, but it allows passing on an inheritable
capability through an intermediate process that simply can't inherit
that capability. This is not unlike requiring an unprivileged process
to ask a privileged process for it to do something on it's behalf.
Certainly it's implicit that you trust the privileged process.

> > I believe we can get something functional with fewer changes, hence
> > easier to understand the ramifications. In a nutshell, I'm still not
> > comfortable with this.
>
> I'll play with it, but I think this is the shortest patch I've come up
> with. I'll admit that touching this stuff scares me too, but I'd rather
> redo it that try and patch it over again.

Yeah, the subtleties are unnerving.

> > Also, it breaks my tests which try to drop privs and keep caps across
> > execve() which is really the only issue we're trying to solve ATM.
>
> Can you send me a sample of what breaks? I do:

Yes. It's something like this:

keepcaps
dropuid
drop caps
execve()

The caps that are left are like this. (effective == inheritable) which
are a subset of permitted.

This is the diff with Posix, which allows a process to inherit a
capability that it can never exercise. However, it could pass the
capablity on to someone else who could inherit it.

<snip>


> >>@@ -36,6 +41,11 @@
> >> int cap_ptrace (struct task_struct *parent, struct task_struct *child)
> >> {
> >> /* Derived from arch/i386/kernel/ptrace.c:sys_ptrace. */
> >>+ /* CAP_SYS_PTRACE still can't bypass inheritable restrictions */
> >>+ if (newcaps &&
> >>+ !cap_issubset (child->cap_inheritable, current->cap_inheritable))
> >>+ return -EPERM;
> >
> >
> > Why no capable() override? In fact, is this check really necessary?
>
> If task A has less inheritable caps than B, then A is somehow less trusted
> and has no business tracing B.

But it's not less. It's just not a subset. Task B could have only one
inheritable cap, while A could have all but the one cap that B has. In
fact, that could be CAP_SYS_PTRACE. So the threat is task A tracing B,
and using it to pass on capabilities that it wasn't allowed to pass on.
Which is what the permitted test was for before, and what CAP_SYS_PTRACE
was used to override.

> A concrete example: a system runs with very restricted inheritable caps
> on all processes except for a magic daemon. The magic daemon holds on
> to CAP_SYS_ADMIN to umount everything at shutdown. If the rest of the
> system gets rooted, it still shouldn't be possible to trace the daemon.
> (Yes, this is currently not workable -- I plan to add a sysctl that sets
> what inheritable caps a task must have for setuid to work. The blanket
> requirement that _all_ must be present is to avoid bugs in which a
> setuid program assumes it will be fully privileged.)

I suppose this eliminates the usefulness of CAP_SYS_PTRACE.

> >> if (!cap_issubset (child->cap_permitted, current->cap_permitted) &&
> >> !capable (CAP_SYS_PTRACE))
> >> return -EPERM;
> >>@@ -76,6 +86,11 @@
> >> return -EPERM;
> >> }
> >>
> >>+ /* verify the _new_Permitted_ is a subset of the _new_Inheritable_ */
> >>+ if (newcaps && !cap_issubset (*permitted, *inheritable)) {
> >>+ return -EPERM;
> >>+ }

This is what breaks my code. I specifcally test permitting the parent to
grab back some caps, but never give them away.

> >> return 0;

As a module it loosk like.

$ grep ^Cap /proc/1/status
CapInh: 00000000ffffffff
CapPrm: 00000000ffffffff
CapEff: 00000000ffffffff

> Speaking of which, this is a genuine problem if commoncap is a module
> and newcaps=0 -- this code will probably never run. Does it matter?

Yup, that's the case I tested. It breaks something that's working
normally now...

> [lots of patch snipped]
>
> >>--- linux-2.6.6-mm2/include/linux/init_task.h~caps 2004-05-13 11:42:26.000000000 -0700
> >>+++ linux-2.6.6-mm2/include/linux/init_task.h 2004-05-13 11:42:51.000000000 -0700
> >>@@ -92,8 +92,8 @@
> >> .function = it_real_fn \
> >> }, \
> >> .group_info = &init_groups, \
> >>- .cap_effective = CAP_INIT_EFF_SET, \
> >>- .cap_inheritable = CAP_INIT_INH_SET, \
> >>+ .cap_effective = CAP_FULL_SET, \
> >>+ .cap_inheritable = CAP_FULL_SET, \
> >
> >
> > This was made unconditional. And how are you convinced it's safe?
>
> Same as above. But even if it were really unconditional (instead
> of just sort-of unconditional), I still think it's safe, because
> (in newcaps=0 mode) only root can get CAP_SETPCAP. Root
> could always just insert a module to enable it. So granting it
> costs nothing.
>
> On the other hand, safety is good, and I can't see any way in the
> old system for anything to get CAP_SETPCAP. I'll send in a new
> patch that just disables CAP_SETPCAP when newcaps=0.

OK.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,
May 14, 2004, 1:00:09 AM5/14/04
to
* Valdis.K...@vt.edu (Valdis.K...@vt.edu) wrote:
> On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:
>
> > I think it still needs more work. Default behavoiur is changed, like
> > Inheritble is full rather than clear, setpcap is enabled, etc. Also,
> > why do you change from Posix the way exec() updates capabilities? Sure,
> > there is no filesystem bits present, so this changes the calculation,
> > but I'm not convinced it's as secure this way. At least with newcaps=0.
>
> The last time the "capabilities" thread reared its head a while ago, Andy made
> a posting that pretty conclusively showed that the Posix way was totally b0rken
> if you ever intended to support filesystem bits. So if you wanted to ever have
> a snowball's chance of supporting something like:
>
> chcap cap_net_raw+ep /bin/ping

It's still not clear that's what we want. For now, just being able to have
the sucap equiv to sudo would be nice. And it's very uncomfortable to
change mainline in subtle ways that could break security during stable
series.

Chris Wright

unread,
May 14, 2004, 1:10:05 AM5/14/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> As for Posix caps, is there any good reason to follow Posix? I
> don't know of any OS that has Posix caps except Linux, and they're
> broken. The spec was dropped, anyway.

Well, I wonder what IRIX does? They support capabilities and had a
reasonable sized hand in the draft. No point in making it impossible
to port apps. It's not that I'm a strong fan of Posix caps, but
compatibility (even with a partially complete draft) with defacto
standards is not entirely unreasonable.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Valdis.K...@vt.edu

unread,
May 14, 2004, 1:40:05 AM5/14/04
to
On Thu, 13 May 2004 22:04:15 PDT, Chris Wright said:

> Well, I wonder what IRIX does? They support capabilities and had a
> reasonable sized hand in the draft. No point in making it impossible
> to port apps. It's not that I'm a strong fan of Posix caps, but
> compatibility (even with a partially complete draft) with defacto
> standards is not entirely unreasonable.

The IRIX 6.5 manpage says:


A process has three, possibly empty, sets of capabilities. The permitted
capability set is the maximum set of capabilities for the process. The
effective capability set contains those capabilities that are currently
active for the process. The inherited capability set contains those
capabilities that the process may pass to the next process image across
exec(2).

Only capabilities in a process' effective capability set allow the
process to perform restricted operations. A process may use capability
management functions to add or remove capabilities from its effective
capability set. However the capabilities that a process can make
effective are limited to those that exist in its permitted capability
set.

Only capabilities in the process' inherited capability set can be passed
across exec(2).

Capabilities are also associated with files. There may or may not be a
capability set associated with a specific file. If a file has no
capability set, execution of this file through an exec(2) will leave the
process' capability set unchanged. If a file has a capability set,
execution of file will affect the process' capability set in the
following way: a file's inherited capability set further constrains the
process inherited capabilities that are passed from one process image to
another. The file's permitted capability set contains the capabilities
that are unconditionally permitted to a process upon execution of that
file. The file's effective capabilities are the capabilities that become
immediately active for the process upon execution of the file.

More precisely described, the process capability assignment algorithm is:

I-proc-new = I-proc-old & I-file
P-proc-new = P-file | (I-proc-new & P-proc-old)
E-proc-new = P-proc-new & E-file

File capabilities are supported only on XFS filesystems.

Note that Irix has *3* capability vectors attached to a file....

Olaf Dietsche

unread,
May 14, 2004, 1:40:06 AM5/14/04
to
Valdis.K...@vt.edu writes:

> On Thu, 13 May 2004 18:20:10 PDT, Chris Wright said:
>
>> I think it still needs more work. Default behavoiur is changed, like
>> Inheritble is full rather than clear, setpcap is enabled, etc. Also,
>> why do you change from Posix the way exec() updates capabilities? Sure,
>> there is no filesystem bits present, so this changes the calculation,
>> but I'm not convinced it's as secure this way. At least with newcaps=0.
>
> The last time the "capabilities" thread reared its head a while ago, Andy made
> a posting that pretty conclusively showed that the Posix way was totally b0rken
> if you ever intended to support filesystem bits. So if you wanted to ever have
> a snowball's chance of supporting something like:
>
> chcap cap_net_raw+ep /bin/ping

Seems like you're not aware of:
<http://www.olafdietsche.de/linux/capability/>

This supports filesystem capabilities with the current (POSIX?)
implementation. So, whatever Andy has shown, it has at least one
counter evidence q.e.d.

> 2) Toss all the filesystems capabilities support out the window.

I agree to disagree ;-)

Regards, Olaf.

Chris Wright

unread,
May 14, 2004, 1:50:07 AM5/14/04
to
* Valdis.K...@vt.edu (Valdis.K...@vt.edu) wrote:
> On Thu, 13 May 2004 22:04:15 PDT, Chris Wright said:
>
> > Well, I wonder what IRIX does? They support capabilities and had a
> > reasonable sized hand in the draft. No point in making it impossible
> > to port apps. It's not that I'm a strong fan of Posix caps, but
> > compatibility (even with a partially complete draft) with defacto
> > standards is not entirely unreasonable.
>
> The IRIX 6.5 manpage says:

Thanks.

OK, this is precisely POSIX as I expected. No surprise given the folks
involved.

<snip manpage>

> Note that Irix has *3* capability vectors attached to a file....

This is what POSIX specifies (with as much authority as a withdrawn spec
has ;-)

Andy Lutomirski

unread,
May 14, 2004, 2:10:04 AM5/14/04
to
[Pardon my gross butchery and reordering.]

Compatibility (i.e. newcaps=0):

Chris Wright wrote:
> * Andy Lutomirski (lu...@myrealbox.com) wrote:
>
>>Chris Wright wrote:
>>
>>
>>>* Andy Lutomirski (lu...@myrealbox.com) wrote:

>>>I think it still needs more work. Default behavoiur is changed, like
>>>Inheritble is full rather than clear, setpcap is enabled, etc. ...
>>
>>In cap_bprm_apply_creds_compat:
>>
>>+ } else if (!fixed_init) {
>>+ /* This is not strictly correct, as it gives linuxrc more
>>+ * permissions than it used to have. It was the only way I
>>+ * could think of to keep the resulting disaster contained,
>>+ * though.
>>+ */
>>+ current->cap_effective = CAP_OLD_INIT_EFF_SET;
>>+ current->cap_inheritable = CAP_OLD_INIT_INH_SET;
>>+ fixed_init = 1;
>>
>>So that it gets changed back. Otherwise linuxrc ran without permissions
>>and my drives never got mounted. Yah, it's ugly -- I'm open to
>>suggestions to avoid this.
>
>
> I tested as a module, and this doesn't run AFAICS

OK -- I give in. I'll redo it as a kernel (non-module) boot parameter.
And touch some more files that have no business being capability-aware
while I'm at it :(


The inheritable mask:

>>>Also, it breaks my tests which try to drop privs and keep caps across
>>>execve() which is really the only issue we're trying to solve ATM.
>>
>>Can you send me a sample of what breaks? I do:
>
>
> Yes. It's something like this:
>
> keepcaps
> dropuid
> drop caps
> execve()
>
> The caps that are left are like this. (effective == inheritable) which
> are a subset of permitted.
>

Is (eff == inh) what happens or what should happen? If the former, then
my patch is broken. If the latter, either I'm confused, or see below.

>>>why do you change from Posix the way exec() updates capabilities? Sure,
>>>there is no filesystem bits present, so this changes the calculation,
>>>but I'm not convinced it's as secure this way. At least with newcaps=0.
>>
>>I'm not convinced that Posix's version makes any sense. Also, there are
>>apparently a number of drafts around which disagree on what the right
>>rules are. (My copy, for example, matches the old rules exactly, but
>>the old rules caused the sendmail problem.) And, under Posix, what does
>>the inheritable mask mean, anyway?
>>
>>Also, I don't find the posix rules to be useful (why is there an
>>inheritable mask if all it does is to cause caps to be dropped on
>>exec, when the user could just manually drop them?).
>
>
> Not sure if it's defensible, but it allows passing on an inheritable
> capability through an intermediate process that simply can't inherit
> that capability. This is not unlike requiring an unprivileged process
> to ask a privileged process for it to do something on it's behalf.
> Certainly it's implicit that you trust the privileged process.
>

[and:]


>>>>+ /* Pretend we have VFS capabilities */
>>>>+ cap_set_full(bprm->cap_inheritable);
>>>
>>>
>>>This looks sketchy.
>>
>>My concept of 'inheritable' is that caps that are _not_ inheritable
>>may never be gained by this task or its children. So a process
>>should normally have all caps inheritable.
>
>
> This is the diff with Posix, which allows a process to inherit a
> capability that it can never exercise. However, it could pass the
> capablity on to someone else who could inherit it.
>
> <snip>
>

So here's where I think we disagree:

Posix (as interpreted by Linux 2.4/2.6) says:
pP' = (fP & cap_bset) | (fI & pI)

So (assuming that set_security did the "obvious" (but dangerous) thing):

pP := "task may enable and use these capabilities"
pE := "task may use these capabilities _now_"
pI := "task may gain these on exec from fI"

fP := "this program gets these caps (module cap_bset)"
fI := "this program gets these caps if pI says so"

Which screams "overcomplicated." I imagine that the use is for a
trusted daemon to run an untrusted helper (with pI>0) which then
calls back into trusted land and gets its caps back. This is
possibly convenient, but it seems to break (where break = scare me)
when there are more than one such system on a given box. Then one's
untrusted program (with fI>0) can call the other's trusted fI>0
helper. I suppose the point is that a random user's program (with fI==0)
can't try to exploit anything, but, for this to be at all secure, both
fI>0 programs need to be secure against attack from the other (unrelated)
system, should it be compromised. Which means it might as well have set
fP>0 and been done with it (I don't believe in security by inconvenience
of exploit).

I see no particular invariants here, except for pE <= pP.

IRIX (thanks Valdis) says:

pI' = pI & fI

pP' = fP | (pI' & fP)

Which I interpret as

pP := "task may enable and use these capabilities"
pE := "task may use these capabilities _now_"
pI := ~"task _loses_ these on exec"

fP := "this program gets these caps"
fI := "this program may keep these caps"

This seems to want pP <= pI as an invariant.

This is what I always thought Linux capabilities meant to be. They
don't make my brain hurt.

But I also think they're overengineered. Instead of:

drop_caps_from_inheritable()
exec()

a program could do:

drop_caps_from_permitted()
exec()

And I can't imagine what use fI != ~0 has, since it's trivially
accomplished by a wrapper. It is also trivially bypassed by
loading the program manually (with ld.so).

So, in my patch, I decided steal the inheritable mask to mean this:

pI := "this process may gain these caps"
fI := "this is an upper bound on pI"

In other words, if a process is extra-untrusted (e.g. it's a daemon
that never needs a certain capability and has no business trying
to gain it), it can drop it from pI. Then it cannot try to abuse
programs with pP>0. The setuid override is just added paranoia.
Another benefit is that it allows a securelevel-like scheme, where
even root isn't quite trusted.

I suppose it might be inappropriate to steal this field like this,
given that IRIX already has a (somewhat reasonable) use for it. I
have no problem implementing IRIX-style capabilities and (if there
is enough interest) adding a _fourth_ process field pM (process
maximum capabilities) that does what my pI does.

As for the fE mask, I just don't see the point, although I _really_
don't like the way it's described in the IRIX manpage.

IRIX has pE = pP & fE. This breaks Posix non-capabilities
compatibility for a program that's uid==0, euid!=0. It should
have pE==0 and pP>0. But it calls exec() and gets pE>0. This
is bad.

Assuming there's something else there to fix that case,
then I still don't see the point. If a program is capability-
aware, it can set its pE however it likes. If not, then it probably
expects pE==pP. I guess there could be a trusted but dumb program
that runs a trusted, cap-aware helper that needs capabilities.
Then the admin sets fE==0 on the dump program and everything works.
Seems a bit contrived, though.

On the other hand, I'm not wedded to my model of pE. It'll be harder
to get IRIX's right for uid!=euid.

CAP_SYS_PTRACE:

It lets one uid/gid trace another. If CAP_SYS_PTRACE allowed a process
to arbitrarity steal another's capabilities, then the process with
CAP_SYS_PTRACE might as well have been given those capabilities. That is,
this should IMHO be disallowed

drop_all_but_CAP_SYS_PTRACE()
exec(slightly trusted debugger process)
ptrace(1) <--- but it was supposed to be "slightly trusted"!

So:

Should I redo this to keep IRIX's meaning of fI, should I keep mine,
or should I do something else. Chris -- in your examples, you seem
to have some idea of what should be happening. What do you think?

--Andy

Valdis.K...@vt.edu

unread,
May 14, 2004, 2:10:07 AM5/14/04
to
On Fri, 14 May 2004 07:33:32 +0200, Olaf Dietsche said:

> Seems like you're not aware of:
> <http://www.olafdietsche.de/linux/capability/>
>
> This supports filesystem capabilities with the current (POSIX?)
> implementation. So, whatever Andy has shown, it has at least one
> counter evidence q.e.d.

Yes.. I was aware of that.. and I just visited it.. and the VERY TOP it says:

"Filesystem capabilities for linux

This implementation is likely *not* POSIX compatible."

Now who should I believe, you or the author of the page? :)

Valdis.K...@vt.edu

unread,
May 14, 2004, 2:40:04 AM5/14/04
to
On Thu, 13 May 2004 22:40:42 PDT, Chris Wright said:

> OK, this is precisely POSIX as I expected. No surprise given the folks
> involved.

Hmm... Chris? Andy? *Exactly* what version of the draft are you both looking at?

I ask because Andy Lutomirski's draft had *different* production rules:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106687456724871&w=2

And I think everybody managed to miss this gotcha (I know I did):

Albert Calahan reported that apparently a lot of people worked off the N-1 version
of the draft, and the equation that's giving us the trouble got changed:

http://marc.theaimsgroup.com/?l=linux-kernel&m=106687419224640&w=2

I wonder if Andy and I were convinced that the Posix draft 16 that people
worked from was broken because it was, but the Posix draft 17 (that looks
like the SGI stuff) was more correct but didn't get seen by everybody?

Olaf Dietsche

unread,
May 14, 2004, 2:50:04 AM5/14/04
to
Andy Lutomirski <lu...@myrealbox.com> writes:

> I'm not convinced that Posix's version makes any sense. Also, there are
> apparently a number of drafts around which disagree on what the right
> rules are. (My copy, for example, matches the old rules exactly, but
> the old rules caused the sendmail problem.)

Don't confuse POSIX _semantics_ with implementation _bugs_.

> And, under Posix, what does
> the inheritable mask mean, anyway?
>
> Also, I don't find the posix rules to be useful (why is there an
> inheritable mask if all it does is to cause caps to be dropped on
> exec, when the user could just manually drop them?).

You can use the inheritable set, if you want to give capabilities to a
process when it's started by an already priviledged parent (e.g. a
root process), but not when it's started by a regular user.

See <http://www.olafdietsche.de/linux/capability/> for an example.

Regards, Olaf.

Olaf Dietsche

unread,
May 14, 2004, 3:20:04 AM5/14/04
to
Valdis.K...@vt.edu writes:

> On Fri, 14 May 2004 07:33:32 +0200, Olaf Dietsche said:
>
>> Seems like you're not aware of:
>> <http://www.olafdietsche.de/linux/capability/>
>>
>> This supports filesystem capabilities with the current (POSIX?)
>> implementation.

This refers to the linux kernel main line.

> Yes.. I was aware of that.. and I just visited it.. and the VERY TOP it says:
>
> "Filesystem capabilities for linux
>
> This implementation is likely *not* POSIX compatible."

This refers to the tools I provide. I should emphasize this on the
page, thank you. My patch doesn't change the rules, how the capability
bits are mingled.

> Now who should I believe, you or the author of the page? :)

Alright, I deserve this for being imprecise. :)

Olaf Dietsche

unread,
May 14, 2004, 7:20:04 AM5/14/04
to
Andy Lutomirski <lu...@myrealbox.com> writes:

> + /* Pretend we have VFS capabilities */
> + cap_set_full(bprm->cap_inheritable);
> + if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
> + cap_set_full(bprm->cap_permitted);
> + else
> + cap_clear(bprm->cap_permitted);

I'd move this to security/commoncap.c:

diff -urN a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c Fri May 14 10:07:28 2004
+++ b/fs/exec.c Fri May 14 12:07:18 2004
@@ -912,13 +912,6 @@
}
}

- /* Pretend we have VFS capabilities */
- cap_set_full(bprm->cap_inheritable);
- if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
- cap_set_full(bprm->cap_permitted);
- else
- cap_clear(bprm->cap_permitted);
-


/* fill in binprm security blob */
retval = security_bprm_set(bprm);
if (retval)

diff -urN a/security/commoncap.c b/security/commoncap.c
--- a/security/commoncap.c Fri May 14 10:07:28 2004
+++ b/security/commoncap.c Fri May 14 12:08:30 2004
@@ -107,8 +107,16 @@



int cap_bprm_set_security (struct linux_binprm *bprm)
{

- if (newcaps)
+ if (newcaps) {


+ /* Pretend we have VFS capabilities */
+ cap_set_full(bprm->cap_inheritable);
+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+ cap_set_full(bprm->cap_permitted);
+ else
+ cap_clear(bprm->cap_permitted);
+

return 0;
+ }

/* Copied from fs/exec.c:prepare_binprm. */

> + /* FIXME: Is this overly harsh on setgid? */
> + if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
> + new_pI != CAP_FULL_SET)
> + bprm->secflags |= BINPRM_SEC_NOELEVATE;
> +
> + if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
> + is_setuid = is_setgid = 0;
> + cap_clear(fP);
> + }

This prevents sendmail from being setuid mail and
cap_net_bind_service=ep.

Regards, Olaf.

Andy Lutomirski

unread,
May 14, 2004, 10:40:09 AM5/14/04
to
Olaf Dietsche wrote:

> Andy Lutomirski <lu...@myrealbox.com> writes:
>
>
>>+ /* Pretend we have VFS capabilities */
>>+ cap_set_full(bprm->cap_inheritable);
>>+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
>>+ cap_set_full(bprm->cap_permitted);
>>+ else
>>+ cap_clear(bprm->cap_permitted);
>
>
> I'd move this to security/commoncap.c:

[snip]

I put it in fs/exec.c because I had to add it to binprm anyway
(having commoncap use ->security would break SELinux), and, as
long as it's a permanent member of the structure, it made sense
for it to always be filled in.

>>+ /* FIXME: Is this overly harsh on setgid? */
>>+ if ((bprm->secflags & (BINPRM_SEC_SETUID | BINPRM_SEC_SETGID)) &&
>>+ new_pI != CAP_FULL_SET)
>>+ bprm->secflags |= BINPRM_SEC_NOELEVATE;
>>+
>>+ if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
>>+ is_setuid = is_setgid = 0;
>>+ cap_clear(fP);
>>+ }
>
>
> This prevents sendmail from being setuid mail and
> cap_net_bind_service=ep.

AAAAH! That's right -- in my implementation <some cap>=ep on a file
makes no sense. It's got to be inheritable to be permitted.

On the other hand, that rule could be safely by checking the old pI
as opposed to the new one. The idea is to prevent a process without
full pI from execing (and thus confusing) old setuid apps.


BTW, in your capabilities implementation, why do you do:

# chcap cap_net_bind_service=ei /usr/sbin/named
# inhcaps cap_net_bind_service=i bind:bind /usr/sbin/named

It seems to me that this wants to be:

# inhcaps cap_net_bind_service=eip bind:bind /usr/sbin/named
(not having looked at your user tool)
or
# cap -c cap_net_bind_service=eip -u bind -g bind /usr/sbin/named
(using my tool)

With my patch, that just works (no fs caps necessary). Why should the
admin have to tag a program so it is allowed to inherit caps? (If
named used libexec, then its libexec helpers would have to be similarly
tagged, and, if it used bash, then bash would need the inheritable tag.)

If the answer's because that's how Linux cap evolution works, then
I'd say that Linux cap evolution is broken.

In any case, I'll probably redo my patch to restore the IRIX version of pI.

--Andy

Albert Cahalan

unread,
May 14, 2004, 10:40:13 AM5/14/04
to
Andy Lutomirski writes:

> Posix (as interpreted by Linux 2.4/2.6) says:
> pP' = (fP & cap_bset) | (fI & pI)
>
> So (assuming that set_security did the "obvious" (but dangerous) thing):
>
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := "task may gain these on exec from fI"
>
> fP := "this program gets these caps (module cap_bset)"
> fI := "this program gets these caps if pI says so"
>
> Which screams "overcomplicated."

People will screw this up.

That you should even have to explain the names as you do is an
indication that the names are poor.

Having names that match IRIX names but act differently is
a sure path to disaster via confused admins and developers.

> I see no particular invariants here, except for pE <= pP.
>
> IRIX (thanks Valdis) says:
>
> pI' = pI & fI
> pP' = fP | (pI' & fP)
>
> Which I interpret as
>
> pP := "task may enable and use these capabilities"
> pE := "task may use these capabilities _now_"
> pI := ~"task _loses_ these on exec"
>
> fP := "this program gets these caps"
> fI := "this program may keep these caps"
>
> This seems to want pP <= pI as an invariant.
>
> This is what I always thought Linux capabilities meant to be. They
> don't make my brain hurt.
>
> But I also think they're overengineered. Instead of:
>
> drop_caps_from_inheritable()
> exec()
>
> a program could do:
>
> drop_caps_from_permitted()
> exec()
>
> And I can't imagine what use fI != ~0 has, since it's trivially
> accomplished by a wrapper. It is also trivially bypassed by
> loading the program manually (with ld.so).

Good point. Even if an exec-only executable would block this,
nearly all admins will fail to mark it as such.

> So, in my patch, I decided steal the inheritable mask to mean this:
>
> pI := "this process may gain these caps"
> fI := "this is an upper bound on pI"
>
> In other words, if a process is extra-untrusted (e.g. it's a daemon
> that never needs a certain capability and has no business trying
> to gain it), it can drop it from pI. Then it cannot try to abuse
> programs with pP>0. The setuid override is just added paranoia.
> Another benefit is that it allows a securelevel-like scheme, where
> even root isn't quite trusted.
>
> I suppose it might be inappropriate to steal this field like this,
> given that IRIX already has a (somewhat reasonable) use for it. I
> have no problem implementing IRIX-style capabilities and (if there
> is enough interest) adding a _fourth_ process field pM (process
> maximum capabilities) that does what my pI does.

A few mostly-unrelated thoughts:

Rather than adding a compile-time option or boot option, simply
change the syscall numbers and /proc/*/status names. This will
cause any existing capability-aware tools to act as if being
run on a pre-capability Linux kernel. This seems to be safer
than allowing these tools to assume that nothing has changed.

Allow me to mark an executable with a set of capabilities that
must be all set or all unset. Default to ~0ull for setuid apps,
and to 0ull for all other apps.
Like this:
if ((pFOO & fBAR) != fBAR) pFOO &= ~fBAR;

Before writing the kernel code, how about writing documentation
for admins, software developers, and Linux vendors? Until clear
and readable documentation exists, this is all just a hazard.
You might even make a mistake in the kernel code if it isn't
easy for a non-kernel hacker to review how things will interact
with their software. Just listing all the cases that need to
be reviewed would be an improvement.

This would be an excellent time to reconsider how capabilities
are assigned to bits. You're breaking things anyway; you might
as well do all the breaking at once. I want local-use bits so
that the print queue management access isn't by magic UID/GID.
We haven't escaped UID-as-priv if server apps and setuid apps
are still making UID-based access control decisions.

Andy Lutomirski

unread,
May 14, 2004, 11:00:15 AM5/14/04
to
Albert Cahalan wrote:

That sounds nasty to get right. What about just simulating this
for the benefit of old tools:

pP = (what it really is)
pE = (what it really is -- but it will mostly be pP)
pI = 0 (or full -- anything else is confusing)

Then capset for on other pids (which currently needs CAP_SETPCAP)
goes away completely, since, on stock Linux, that code is
unreachable anyway.

I think this emulates the current linux caps quite nicely. Unless
we add lots more caps, in which case we'd have to make a guess at
what pP is.

Now this code becomes complicated (code-wise, not just conceptually :)

>
> Allow me to mark an executable with a set of capabilities that
> must be all set or all unset. Default to ~0ull for setuid apps,
> and to 0ull for all other apps.
> Like this:
> if ((pFOO & fBAR) != fBAR) pFOO &= ~fBAR;

Once again bypassable by ld.so. What about
if ((pM' & fBAR) != fBAR) ignore fP and setuid
(BINPRM_SEC_NO_ELEVATE)?

I'll call it fR (for required). The default would be:
setuid-root: fR = full
setuid-nonroot and setgid: fR = CAP_SETUID

I check pM' not pP' because in the normal case these capabilities
were in fP anyway (except for setgid/setuid-nonroot, in which case
the admin probably doesn't want the low-pM task changing its uid).
If they weren't, then IMHO it's the job of the task with elevated
caps to be careful of what it runs rather than the filesystem.

So we get:

fM: maximum caps for this app and children (default full)
fP: app gains these (subject to fM and pM)
fR: as above

pM: maximum caps for this task and children (full for init)
pP: the usual
pE: the usual

>
> Before writing the kernel code, how about writing documentation
> for admins, software developers, and Linux vendors? Until clear
> and readable documentation exists, this is all just a hazard.
> You might even make a mistake in the kernel code if it isn't
> easy for a non-kernel hacker to review how things will interact
> with their software. Just listing all the cases that need to
> be reviewed would be an improvement.

Good call. I'll do that.

>
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.
>

How many bits? Or should it even be a bitmask?

I'm thinking either 64 or 128 for kernel-defined caps and either
a seperate 128 bits or more or just a list for local-defined.

Then local caps could live in /etc (i don't like it) or /proc (not so
nice either) for collision-avoidance.

Or even numbered bits for kernel use and _names_ for user use.
You'd do sys_cap_register('printquota') and then that becomes
a legal name of a capability. Then you need sys_cap_unregister()
to atomically remove it from all tasks.

I'll think about how my maximum mask fits in to all this.


Thanks,
Andy

Stephen Smalley

unread,
May 14, 2004, 11:30:11 AM5/14/04
to
On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.

Capabilities are a broken model for privilege management; try RBAC/TE
instead. http://www.securecomputing.com/pdf/secureos.pdf has notes
about the history and comparison of capabilities vs. TE.

Instead of adding new capability bits, replace capable() calls with LSM
hook calls that offer you finer granularity both in operation and in
object-based decisions, and then use a security module like SELinux to
map that to actual permission checks. SELinux provides a framework for
defining security classes and permissions, including both definitions
used by the kernel and definitions used by userspace policy enforcers
(ala security-enhanced X).

--
Stephen Smalley <s...@epoch.ncsc.mil>
National Security Agency

Andy Lutomirski

unread,
May 14, 2004, 12:00:15 PM5/14/04
to

Stephen Smalley wrote:

> On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
>
>>This would be an excellent time to reconsider how capabilities
>>are assigned to bits. You're breaking things anyway; you might
>>as well do all the breaking at once. I want local-use bits so
>>that the print queue management access isn't by magic UID/GID.
>>We haven't escaped UID-as-priv if server apps and setuid apps
>>are still making UID-based access control decisions.
>
>
> Capabilities are a broken model for privilege management; try RBAC/TE
> instead. http://www.securecomputing.com/pdf/secureos.pdf has notes
> about the history and comparison of capabilities vs. TE.
>
> Instead of adding new capability bits, replace capable() calls with LSM
> hook calls that offer you finer granularity both in operation and in
> object-based decisions, and then use a security module like SELinux to
> map that to actual permission checks. SELinux provides a framework for
> defining security classes and permissions, including both definitions
> used by the kernel and definitions used by userspace policy enforcers
> (ala security-enhanced X).
>

Thanks -- turning brain back on, SELinux is obviously better than any
fine-grained capability scheme I can imagine.

So unless anyone convinces me you're wrong, I'll stick with just
fixing up capabilities to work without making them finer-grained.

--Andy

Stephen Smalley

unread,
May 14, 2004, 12:10:11 PM5/14/04
to
On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> Thanks -- turning brain back on, SELinux is obviously better than any
> fine-grained capability scheme I can imagine.
>
> So unless anyone convinces me you're wrong, I'll stick with just
> fixing up capabilities to work without making them finer-grained.

Great, thanks. Fixing capabilities to work is definitely useful and
desirable. Significantly expanding them in any manner is a poor use of
limited resources, IMHO; I'd much rather see people work on applying
SELinux to the problem and solving it more effectively for the future.

--
Stephen Smalley <s...@epoch.ncsc.mil>
National Security Agency

-

Andy Lutomirski

unread,
May 14, 2004, 12:30:06 PM5/14/04
to
Stephen Smalley wrote:

> On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
>
>>Thanks -- turning brain back on, SELinux is obviously better than any
>>fine-grained capability scheme I can imagine.
>>
>>So unless anyone convinces me you're wrong, I'll stick with just
>>fixing up capabilities to work without making them finer-grained.
>
>
> Great, thanks. Fixing capabilities to work is definitely useful and
> desirable. Significantly expanding them in any manner is a poor use of
> limited resources, IMHO; I'd much rather see people work on applying
> SELinux to the problem and solving it more effectively for the future.
>

Does this mean I should trash my 'maximum' mask?

(I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
OTOH, since SELinux accomplishes this better, it may not be worth the
effort.

--Andy

Stephen Smalley

unread,
May 14, 2004, 12:50:04 PM5/14/04
to
On Fri, 2004-05-14 at 12:18, Andy Lutomirski wrote:
> Does this mean I should trash my 'maximum' mask?
>
> (I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
> OTOH, since SELinux accomplishes this better, it may not be worth the
> effort.

Not my call to make, but I'm not sure it is worthwhile. Even filesystem
capabilities are questionable IMHO, as you can already authorize
capabilities based on program and call chain (typically just the
immediate caller's domain, but potentially encoding the entire call
chain in the domain transition rules to track any untrusted components
in the call chain) today using SELinux TE. Today, the program still has
to be setuid as well, as we currently check both ordinary Linux
capability and SELinux permission for each operation to be conservative,
but eventually we would hope to drop the use of the capability module as
a secondary module and let SELinux manage the privileges entirely (once
the TE policy configuration is fully locked down and userland is fully
aware of SELinux).

--
Stephen Smalley <s...@epoch.ncsc.mil>
National Security Agency

-

Message has been deleted
Message has been deleted

Chris Wright

unread,
May 14, 2004, 2:00:11 PM5/14/04
to
* Albert Cahalan (alb...@users.sourceforge.net) wrote:
> This would be an excellent time to reconsider how capabilities
> are assigned to bits. You're breaking things anyway; you might
> as well do all the breaking at once. I want local-use bits so
> that the print queue management access isn't by magic UID/GID.
> We haven't escaped UID-as-priv if server apps and setuid apps
> are still making UID-based access control decisions.

This is too volatile in stable series.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,
May 14, 2004, 2:10:07 PM5/14/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> > This would be an excellent time to reconsider how capabilities
> > are assigned to bits. You're breaking things anyway; you might
> > as well do all the breaking at once. I want local-use bits so
> > that the print queue management access isn't by magic UID/GID.
> > We haven't escaped UID-as-priv if server apps and setuid apps
> > are still making UID-based access control decisions.
>
> How many bits? Or should it even be a bitmask?
>
> I'm thinking either 64 or 128 for kernel-defined caps and either
> a seperate 128 bits or more or just a list for local-defined.

Starts to look like the list of LSM callbacks. Making it bigger doesn't
help the simple issue, keep one lousy bit across execve(). All this
redesign seems wrong to do in 2.6.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,
May 14, 2004, 2:10:09 PM5/14/04
to
* Stephen Smalley (s...@epoch.ncsc.mil) wrote:
> On Fri, 2004-05-14 at 08:03, Albert Cahalan wrote:
> > This would be an excellent time to reconsider how capabilities
> > are assigned to bits. You're breaking things anyway; you might
> > as well do all the breaking at once. I want local-use bits so
> > that the print queue management access isn't by magic UID/GID.
> > We haven't escaped UID-as-priv if server apps and setuid apps
> > are still making UID-based access control decisions.
>
> Capabilities are a broken model for privilege management; try RBAC/TE
> instead. http://www.securecomputing.com/pdf/secureos.pdf has notes
> about the history and comparison of capabilities vs. TE.
>
> Instead of adding new capability bits, replace capable() calls with LSM
> hook calls that offer you finer granularity both in operation and in
> object-based decisions, and then use a security module like SELinux to
> map that to actual permission checks. SELinux provides a framework for
> defining security classes and permissions, including both definitions
> used by the kernel and definitions used by userspace policy enforcers
> (ala security-enhanced X).

exactly!

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Stephen Smalley

unread,
May 14, 2004, 2:10:11 PM5/14/04
to
On Fri, 2004-05-14 at 11:19, Albert Cahalan wrote:
> I just read that. It's a very unfair marketing document.
> Among other things, it suggests that a capability system
> is stuck with about 40 bits while their own version of
> capabilities (a duck is a duck...) has 80 bits. Lovely,
> but not exactly groundbreaking.

You missed the point. Capability scheme maps far too many operations to
a single capability; CAP_SYS_ADMIN in Linux is a good example. TE model
defers organization of operations into equivalence classes to the policy
writer.

> There is the bit about
> a 3-argument security call, but a careful reading will
> reveal that one argument is unused (NULL?) when dealing
> with abilities like "can set the clock".

But very useful when dealing with CAP_DAC_OVERRIDE and friends.

> About the only thing of interest is that capability
> transitions can be arbitrary. You're not limited to
> an obscure set of equations that nobody can agree on.

Because there is no one-size-fits-all equation for all transitions.

> The cost: complicated site-specific config files and
> the inability to support capability-aware apps that
> set+clear their own bits.

Complexity pushed to userspace, where it can be analyzed appropriately
via tools and tailored for the transition in question. Central
management of the capabilities based on equivalence classes (types), as
opposed to having to manage a distributed nightmare of capability bits
on your filesystems. Applications that can transition to other domains,
but only if so authorized by the policy.

> Eh, why? That's mostly a search-and-replace on the name,
> since capable() makes a perfectly fine LSM hook.

It doesn't offer sufficient granularity, either for operation (which you
_could_ address by introducing new capabilities, but the hooks are more
easily extensible) or for object.

> So what about the old-Oracle problem? You have a
> server that needs the ability to hog and lock memory.
> Is there an almost-empty SELinux policy that would
> provide this while leaving the rest of the system
> acting as UNIX-like systems have always acted?
> If so, we have a winner.

See "relaxed policy" or "targeted policy" in recent discussions on the
Fedora mailing lists; coming soon to a rawhide near you. Not exactly
the same thing (it is a policy where most processes run essentially
unconfined except for a targeted set that you define like apache, bind,
etc), but you could certainly tweak it to this end (i.e. you put oracle
into a domain that gives it the one capability you desire).

> One still does need to provide apps with a way to
> answer "can I do FOO, BAR, and BAZ?" and "am I
> running with elevated privileges?". Some way to
> dispose of unneeded privileges would be good too.
> Hopefully extra libraries wouldn't be needed.

SELinux provides a policy API already, and a userspace library for
accessing it (and caching decisions from it). It also provides (via the
AT_SECURE auxv entry) notification that a domain transition has
occurred, and this is already used by glibc secure mode. Privileges are
dropped via domain transitions.

Chris Wright

unread,
May 14, 2004, 2:20:09 PM5/14/04
to
* Andy Lutomirski (lu...@stanford.edu) wrote:
> Stephen Smalley wrote:
> > On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> > > Thanks -- turning brain back on, SELinux is obviously better than any
> > > fine-grained capability scheme I can imagine.
> > >
> > > So unless anyone convinces me you're wrong, I'll stick with just
> > > fixing up capabilities to work without making them finer-grained.
> >
> > Great, thanks. Fixing capabilities to work is definitely useful and
> > desirable. Significantly expanding them in any manner is a poor use of
> > limited resources, IMHO; I'd much rather see people work on applying
> > SELinux to the problem and solving it more effectively for the future.
>
> Does this mean I should trash my 'maximum' mask?
>
> (I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
> OTOH, since SELinux accomplishes this better, it may not be worth the
> effort.

Let's just get back to the simplest task. Allow execve() to do smth.
reasonable with capabilities.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Albert Cahalan

unread,
May 14, 2004, 4:00:49 PM5/14/04
to
On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> On Fri, 2004-05-14 at 11:19, Albert Cahalan wrote:
> > I just read that. It's a very unfair marketing document.
> > Among other things, it suggests that a capability system
> > is stuck with about 40 bits while their own version of
> > capabilities (a duck is a duck...) has 80 bits. Lovely,
> > but not exactly groundbreaking.
>
> You missed the point. Capability scheme maps far too
> many operations to a single capability; CAP_SYS_ADMIN
> in Linux is a good example.

What I said: lovely, but not exactly groundbreaking.

Suppose we break up CAP_SYS_ADMIN into 41 other bits.
There you go. It's silly to argue that a system with
more bits is some kind of major advance over one with
just a few bits. There is a quality-of-implementation
issue here, not some fundamental difference.

> TE model
> defers organization of operations into equivalence
> classes to the policy writer.

I don't see anything special here either. With a
plain capability-bit system, you could allow for
user-defined aliases that map to multiple bits.
In some random /etc config file:

define ADMIN := FOO | BAR | BAZ

> > There is the bit about
> > a 3-argument security call, but a careful reading will
> > reveal that one argument is unused (NULL?) when dealing
> > with abilities like "can set the clock".
>
> But very useful when dealing with CAP_DAC_OVERRIDE and friends.

I suppose so, but that isn't the interesting case.
We already have SELinux for that kind of thing.

> > About the only thing of interest is that capability
> > transitions can be arbitrary. You're not limited to
> > an obscure set of equations that nobody can agree on.
>
> Because there is no one-size-fits-all equation for all transitions.
>
> > The cost: complicated site-specific config files and
> > the inability to support capability-aware apps that
> > set+clear their own bits.
>
> Complexity pushed to userspace, where it can be analyzed appropriately
> via tools and tailored for the transition in question. Central
> management of the capabilities based on equivalence classes (types), as
> opposed to having to manage a distributed nightmare of capability bits
> on your filesystems. Applications that can transition to other domains,
> but only if so authorized by the policy.

Complexity is pushed to admins, most of which are less
clueful than one might hope.

> > Eh, why? That's mostly a search-and-replace on the name,
> > since capable() makes a perfectly fine LSM hook.
>
> It doesn't offer sufficient granularity, either for operation (which you
> _could_ address by introducing new capabilities, but the hooks are more
> easily extensible) or for object.

SELinux does, I hope, already deal with anything that
would involve an "object". So it is of little concern
what CAP_DAC_OVERRIDE does. What about setting the clock,
hogging memory, using the FIFO scheduler, and so on?

Lack of granularity is an implementation detail.
(Blame the SGI folks that wouldn't listen to me.)
Lack of granularity is not a design flaw.

> > So what about the old-Oracle problem? You have a
> > server that needs the ability to hog and lock memory.
> > Is there an almost-empty SELinux policy that would
> > provide this while leaving the rest of the system
> > acting as UNIX-like systems have always acted?
> > If so, we have a winner.
>
> See "relaxed policy" or "targeted policy" in recent discussions on the
> Fedora mailing lists; coming soon to a rawhide near you. Not exactly
> the same thing (it is a policy where most processes run essentially
> unconfined except for a targeted set that you define like apache, bind,
> etc), but you could certainly tweak it to this end (i.e. you put oracle
> into a domain that gives it the one capability you desire).

What I'm looking for:

1. configure the kernel by ...
2. ensure that /bin/foo runs early in boot
3. put "blah blah blah" in /etc/foo.conf

That is, is there a small set of simple config files
and binaries that I could just slap onto an existing
system to ensure that a particular app is granted an
extra capability bit?

If yes, then the ugly old-Oracle hack is not needed.

> > One still does need to provide apps with a way to
> > answer "can I do FOO, BAR, and BAZ?" and "am I
> > running with elevated privileges?". Some way to
> > dispose of unneeded privileges would be good too.
> > Hopefully extra libraries wouldn't be needed.
>
> SELinux provides a policy API already, and a userspace library for
> accessing it (and caching decisions from it). It also provides (via the
> AT_SECURE auxv entry) notification that a domain transition has
> occurred, and this is already used by glibc secure mode. Privileges are
> dropped via domain transitions.

I take that as a "no" then, at least from the viewpoint
of a normal app author.

Chris Wright

unread,
May 14, 2004, 5:20:07 PM5/14/04
to
* Albert Cahalan (alb...@users.sourceforge.net) wrote:
> On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> > You missed the point. Capability scheme maps far too
> > many operations to a single capability; CAP_SYS_ADMIN
> > in Linux is a good example.
>
> What I said: lovely, but not exactly groundbreaking.
>
> Suppose we break up CAP_SYS_ADMIN into 41 other bits.
> There you go. It's silly to argue that a system with
> more bits is some kind of major advance over one with
> just a few bits. There is a quality-of-implementation
> issue here, not some fundamental difference.

Needing more bits isn't the only problem.

> > TE model
> > defers organization of operations into equivalence
> > classes to the policy writer.
>
> I don't see anything special here either. With a
> plain capability-bit system, you could allow for
> user-defined aliases that map to multiple bits.
> In some random /etc config file:
>
> define ADMIN := FOO | BAR | BAZ

This doesn't effect the inflexibility of a single definition for domain
transistion that's inherent in the capabilty system.

> Lack of granularity is an implementation detail.
> (Blame the SGI folks that wouldn't listen to me.)
> Lack of granularity is not a design flaw.

It's a design flaw. More bits won't help. There's an important missing
piece...credentials for both subject and object. Both of which can be
dynamic, and differ per subject's view of an object.

> What I'm looking for:
>
> 1. configure the kernel by ...
> 2. ensure that /bin/foo runs early in boot
> 3. put "blah blah blah" in /etc/foo.conf
>
> That is, is there a small set of simple config files
> and binaries that I could just slap onto an existing
> system to ensure that a particular app is granted an
> extra capability bit?
>
> If yes, then the ugly old-Oracle hack is not needed.

Nearly. There's the minor issue that execve() clears that bit more
agressively than desired for non-root processes. Otherwise pam can do
it with pam_cap. Which is all we're trying to fix here.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Albert Cahalan

unread,
May 14, 2004, 6:00:12 PM5/14/04
to
On Fri, 2004-05-14 at 17:11, Chris Wright wrote:
> * Albert Cahalan (alb...@users.sourceforge.net) wrote:
> > On Fri, 2004-05-14 at 14:06, Stephen Smalley wrote:
> > > You missed the point. Capability scheme maps far too
> > > many operations to a single capability; CAP_SYS_ADMIN
> > > in Linux is a good example.
> >
> > What I said: lovely, but not exactly groundbreaking.
> >
> > Suppose we break up CAP_SYS_ADMIN into 41 other bits.
> > There you go. It's silly to argue that a system with
> > more bits is some kind of major advance over one with
> > just a few bits. There is a quality-of-implementation
> > issue here, not some fundamental difference.
>
> Needing more bits isn't the only problem.

That's what much of the document went on about. The
rest of the document was mostly generic MAC concepts.

> > > TE model
> > > defers organization of operations into equivalence
> > > classes to the policy writer.
> >
> > I don't see anything special here either. With a
> > plain capability-bit system, you could allow for
> > user-defined aliases that map to multiple bits.
> > In some random /etc config file:
> >
> > define ADMIN := FOO | BAR | BAZ
>
> This doesn't effect the inflexibility of a single definition for domain
> transistion that's inherent in the capabilty system.

Sure. I already noted this.

> > Lack of granularity is an implementation detail.
> > (Blame the SGI folks that wouldn't listen to me.)
> > Lack of granularity is not a design flaw.
>
> It's a design flaw. More bits won't help. There's an important missing
> piece...credentials for both subject and object. Both of which can be
> dynamic, and differ per subject's view of an object.

There is no meaningful object.

subject: process 12345
object: ??????
operation: lock memory

For a few capability bits, there is a meaningful object
and you could use SELinux in place of the capability bits.
For most of the capability bits, this is not the case.

> > What I'm looking for:
> >
> > 1. configure the kernel by ...
> > 2. ensure that /bin/foo runs early in boot
> > 3. put "blah blah blah" in /etc/foo.conf
> >
> > That is, is there a small set of simple config files
> > and binaries that I could just slap onto an existing
> > system to ensure that a particular app is granted an
> > extra capability bit?
> >
> > If yes, then the ugly old-Oracle hack is not needed.
>
> Nearly. There's the minor issue that execve() clears that bit more
> agressively than desired for non-root processes. Otherwise pam can do
> it with pam_cap. Which is all we're trying to fix here.

Stephen Smalley suggested that SELinux could take the
place of our capability bits. So I'm asking how you do
that, using the most minimal SELinux config.

If he really has a way, then there is no need to change
the execve() behavior in the Linux 2.6.x kernels.

Perhaps we just need an LSM hook to re-add the capability
bit after execve() drops it. That's a tiny change that
doesn't affect any existing system.

Andy Lutomirski

unread,
May 14, 2004, 7:00:11 PM5/14/04
to
On Friday 14 May 2004 11:07, Chris Wright wrote:
> * Andy Lutomirski (lu...@stanford.edu) wrote:
> > Stephen Smalley wrote:
> > > On Fri, 2004-05-14 at 11:57, Andy Lutomirski wrote:
> > > > Thanks -- turning brain back on, SELinux is obviously better than any
> > > > fine-grained capability scheme I can imagine.
> > > >
> > > > So unless anyone convinces me you're wrong, I'll stick with just
> > > > fixing up capabilities to work without making them finer-grained.
> > >
> > > Great, thanks. Fixing capabilities to work is definitely useful and
> > > desirable. Significantly expanding them in any manner is a poor use of
> > > limited resources, IMHO; I'd much rather see people work on applying
> > > SELinux to the problem and solving it more effectively for the future.
> >
> > Does this mean I should trash my 'maximum' mask?
> >
> > (I like 'cap -c = sftp-server' so it can't try to run setuid/fP apps.)
> > OTOH, since SELinux accomplishes this better, it may not be worth the
> > effort.
>
> Let's just get back to the simplest task. Allow execve() to do smth.
> reasonable with capabilities.

Sold.

This version throws out the inheritable mask. There is no change in
behavior with newcaps=0.

All that's left in newcaps=1 mode is:

fP := permitted (app gains these)
pP := permitted (app can make them effective)
pE := effective

There is no inheritable mask :) So we can't argue about what it's
supposed to / not supposed to.

It's not too well tested yet. Enjoy.

--Andy

cap_2.6.6-mm2_4.patch: New stripped-back capabilities.

fs/exec.c | 15 ++++-
include/linux/binfmts.h | 9 ++-
security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
3 files changed, 136 insertions(+), 18 deletions(-)

--- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/fs/exec.c 2004-05-14 12:36:17.000000000 -0700
@@ -882,8 +882,10 @@

if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ bprm->secflags |= BINPRM_SEC_SETUID;
+ }

/* Set-gid? */
/*
@@ -891,10 +893,18 @@
* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ bprm->secflags |= BINPRM_SEC_SETGID;
+ }


}

+ /* Pretend we have VFS capabilities */

+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+ cap_set_full(bprm->cap_permitted);
+ else
+ cap_clear(bprm->cap_permitted);

+


/* fill in binprm security blob */
retval = security_bprm_set(bprm);
if (retval)

@@ -1089,6 +1099,7 @@
bprm.loader = 0;
bprm.exec = 0;
bprm.security = NULL;
+ bprm.secflags = 0;
bprm.mm = mm_alloc();
retval = -ENOMEM;
if (!bprm.mm)
--- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/security/commoncap.c 2004-05-14 13:24:45.000000000 -0700
@@ -24,6 +24,11 @@
#include <linux/xattr.h>
#include <linux/hugetlb.h>

+static int newcaps = 0;
+
+module_param(newcaps, int, 444);
+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
+
int cap_capable (struct task_struct *tsk, int cap)
{
/* Derived from include/linux/sched.h:capable. */
@@ -48,17 +53,19 @@
{
/* Derived from kernel/capability.c:sys_capget. */
*effective = cap_t (target->cap_effective);
- *inheritable = cap_t (target->cap_inheritable);
*permitted = cap_t (target->cap_permitted);
+ if (newcaps)
+ *inheritable = CAP_EMPTY_SET;
+ else
+ *inheritable = cap_t (target->cap_inheritable);
return 0;
}

int cap_capset_check (struct task_struct *target, kernel_cap_t *effective,
kernel_cap_t *inheritable, kernel_cap_t *permitted)
{
- /* Derived from kernel/capability.c:sys_capset. */
/* verify restrictions on target's new Inheritable set */
- if (!cap_issubset (*inheritable,
+ if (!newcaps && !cap_issubset (*inheritable,
cap_combine (target->cap_inheritable,
current->cap_permitted))) {
return -EPERM;
@@ -83,12 +90,16 @@
kernel_cap_t *inheritable, kernel_cap_t *permitted)
{
target->cap_effective = *effective;
- target->cap_inheritable = *inheritable;
target->cap_permitted = *permitted;
+ if (!newcaps)
+ target->cap_inheritable = *inheritable;


}

int cap_bprm_set_security (struct linux_binprm *bprm)
{

+ if (newcaps)


+ return 0;
+
/* Copied from fs/exec.c:prepare_binprm. */

/* We don't have VFS support for capabilities yet */
@@ -115,9 +126,9 @@
return 0;
}

-void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+static void cap_bprm_apply_creds_compat (struct linux_binprm *bprm, int unsafe)
{
- /* Derived from fs/exec.c:compute_creds. */
+ /* This function will hopefully die in 2.7. */
kernel_cap_t new_permitted, working;

new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
@@ -158,15 +169,88 @@
current->keep_capabilities = 0;
}

+/*
+ * The rules of Linux capabilities (not POSIX!)
+ *
+ * What the masks mean:
+ * pP = capabilities that this process has
+ * pE = capabilities that this process has and are enabled
+ * (so pE <= pP)
+ *
+ * The capability evolution rules are:
+ *
+ * pP' = ((fP & cap_bset) | pP) & Y
+ * pE' = (pE | fP) & pP'
+ *
+ * X = cap_bset
+ * Y is zero if uid!=0, euid==0, and setuid non-root
+ *
+ * Exception: if setuid-nonroot, then pE' is reset to 0.
+ *
+ * If file capabilities are introduced, programs that are granted
+ * fP > 0 need to be aware of how to deal with it.
+ * Because the effective set is left alone on non-setuid fP>0,
+ * such a program should drop capabilities that were not in its initial
+ * effective set before running untrusted code.
+ *
+ */
+void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
+{
+ kernel_cap_t new_pP, new_pE, fP;
+ int is_setuid, is_setgid;
+
+ if(!newcaps) {
+ cap_bprm_apply_creds_compat(bprm, unsafe);
+ return;
+ }
+
+ fP = bprm->cap_permitted;
+ is_setuid = (bprm->secflags & BINPRM_SEC_SETUID);
+ is_setgid = (bprm->secflags & BINPRM_SEC_SETGID);
+
+ /* Check that it's safe to elevate privileges */
+ if (unsafe & ~LSM_UNSAFE_PTRACE_CAP)


+ bprm->secflags |= BINPRM_SEC_NOELEVATE;
+
+ if (bprm->secflags & BINPRM_SEC_NOELEVATE) {
+ is_setuid = is_setgid = 0;
+ cap_clear(fP);
+ }

+
+ new_pP = cap_intersect(fP, cap_bset);
+ new_pP = cap_combine(new_pP, current->cap_permitted);
+
+ /* setuid-nonroot is special. */
+ if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
+ current->euid == 0)
+ cap_clear(new_pP);
+
+ if (!cap_issubset(new_pP, current->cap_permitted))
+ bprm->secflags |= BINPRM_SEC_SECUREEXEC;
+
+ new_pE = cap_combine(current->cap_effective, fP);
+ cap_mask(new_pE, new_pP);
+ if (is_setuid && bprm->e_uid != 0)
+ cap_clear(new_pE);
+
+ /* Apply new security state */
+ if (is_setuid) {
+ current->suid = current->euid = current->fsuid = bprm->e_uid;
+ }
+ if (is_setgid)
+ current->sgid = current->egid = current->fsgid = bprm->e_gid;
+
+ current->cap_permitted = new_pP;
+ current->cap_effective = new_pE;
+
+ current->keep_capabilities = 0;
+}
+
int cap_bprm_secureexec (struct linux_binprm *bprm)
{
- /* If/when this module is enhanced to incorporate capability
- bits on files, the test below should be extended to also perform a
- test between the old and new capability sets. For now,
- it simply preserves the legacy decision algorithm used by
- the old userland. */
return (current->euid != current->uid ||
- current->egid != current->gid);
+ current->egid != current->gid ||
+ (bprm->secflags & BINPRM_SEC_SECUREEXEC));
}

int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,
@@ -280,9 +364,14 @@

void cap_task_reparent_to_init (struct task_struct *p)
{
- p->cap_effective = CAP_INIT_EFF_SET;
- p->cap_inheritable = CAP_INIT_INH_SET;
- p->cap_permitted = CAP_FULL_SET;
+ if (newcaps) {
+ cap_set_full(p->cap_permitted);
+ cap_set_full(p->cap_effective);
+ } else {
+ p->cap_effective = CAP_INIT_EFF_SET;
+ p->cap_inheritable = CAP_INIT_INH_SET;
+ p->cap_permitted = CAP_FULL_SET;
+ }
p->keep_capabilities = 0;
return;
}
@@ -367,6 +456,15 @@
return -ENOMEM;
}

+static int __init commoncap_init (void)
+{
+ if (newcaps) {
+ printk(KERN_NOTICE "Experimental capability support is on\n");
+ }
+
+ return 0;
+}
+
EXPORT_SYMBOL(cap_capable);
EXPORT_SYMBOL(cap_ptrace);
EXPORT_SYMBOL(cap_capget);
@@ -382,5 +480,7 @@
EXPORT_SYMBOL(cap_syslog);
EXPORT_SYMBOL(cap_vm_enough_memory);

+module_init(commoncap_init);
+
MODULE_DESCRIPTION("Standard Linux Common Capabilities Security Module");
MODULE_LICENSE("GPL");
--- linux-2.6.6-mm2/include/linux/binfmts.h~caps 2004-05-13 11:42:26.000000000 -0700
+++ linux-2.6.6-mm2/include/linux/binfmts.h 2004-05-14 10:28:05.000000000 -0700
@@ -20,6 +20,10 @@
/*
* This structure is used to hold the arguments that are used when loading binaries.
*/
+#define BINPRM_SEC_SETUID 1
+#define BINPRM_SEC_SETGID 2
+#define BINPRM_SEC_SECUREEXEC 4
+#define BINPRM_SEC_NOELEVATE 8
struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
struct page *page[MAX_ARG_PAGES];
@@ -28,7 +32,10 @@
int sh_bang;
struct file * file;
int e_uid, e_gid;
- kernel_cap_t cap_inheritable, cap_permitted, cap_effective;
+ int secflags;
+ kernel_cap_t cap_permitted;
+ /* old caps -- do NOT use in new code */
+ kernel_cap_t cap_inheritable, cap_effective;
void *security;
int argc, envc;
char * filename; /* Name of binary as seen by procps */

Olaf Dietsche

unread,
May 14, 2004, 8:20:07 PM5/14/04
to
Andy Lutomirski <lu...@myrealbox.com> writes:

> cap_2.6.6-mm2_4.patch: New stripped-back capabilities.
>
> fs/exec.c | 15 ++++-
> include/linux/binfmts.h | 9 ++-
> security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
> 3 files changed, 136 insertions(+), 18 deletions(-)

[patch]

Why don't you provide this as a configurable andycap.c module?
I think, this is the whole point of LSM.

Regards, Olaf.

Chris Wright

unread,
May 14, 2004, 8:50:05 PM5/14/04
to
* Olaf Dietsche (olaf+list.l...@olafdietsche.de) wrote:
> Andy Lutomirski <lu...@myrealbox.com> writes:
>
> > cap_2.6.6-mm2_4.patch: New stripped-back capabilities.
> >
> > fs/exec.c | 15 ++++-
> > include/linux/binfmts.h | 9 ++-
> > security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
> > 3 files changed, 136 insertions(+), 18 deletions(-)
> [patch]
>
> Why don't you provide this as a configurable andycap.c module?
> I think, this is the whole point of LSM.

I agree, if we can't find a clean way to do it. However, note this
includes changes to core. And it's nice to fix this for the base case.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Albert Cahalan

unread,
May 14, 2004, 8:50:06 PM5/14/04
to
On Fri, 2004-05-14 at 20:06, Olaf Dietsche wrote:
> Andy Lutomirski <lu...@myrealbox.com> writes:
>
> > cap_2.6.6-mm2_4.patch: New stripped-back capabilities.
> >
> > fs/exec.c | 15 ++++-
> > include/linux/binfmts.h | 9 ++-
> > security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
> > 3 files changed, 136 insertions(+), 18 deletions(-)
> [patch]
>
> Why don't you provide this as a configurable andycap.c module?

**GROAN**

For those that don't know, this is Andy Cap:
http://picosel.free.fr/andycap/andy_billard.jpg

Olaf Dietsche

unread,
May 15, 2004, 12:00:13 PM5/15/04
to
Andy Lutomirski <lu...@myrealbox.com> writes:

> BTW, in your capabilities implementation, why do you do:

There's no such thing like "my" capabilities implementation. I use the
capabilities implementation of linux main line as it is. My patch
provides the possibility to connect capabilities to files, nothing
else.

I never tried to fix or modify, how capabilities work.

> If the answer's because that's how Linux cap evolution works, then
> I'd say that Linux cap evolution is broken.

Well, it works for me. :-)

Andy Lutomirski

unread,
May 17, 2004, 3:30:10 AM5/17/04
to
Chris Wright wrote:

> * Olaf Dietsche (olaf+list.l...@olafdietsche.de) wrote:
>
>>Andy Lutomirski <lu...@myrealbox.com> writes:
>>
>>
>>>cap_2.6.6-mm2_4.patch: New stripped-back capabilities.
>>>
>>> fs/exec.c | 15 ++++-
>>> include/linux/binfmts.h | 9 ++-
>>> security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
>>> 3 files changed, 136 insertions(+), 18 deletions(-)
>>
>>[patch]
>>
>>Why don't you provide this as a configurable andycap.c module?
>>I think, this is the whole point of LSM.
>
>
> I agree, if we can't find a clean way to do it. However, note this
> includes changes to core. And it's nice to fix this for the base case.

On the other hand, this version has minimal changes to core (it adds a new
field to linux_binprm and makes fs/exec.c fill in some extra information).
These changes shouldn't break any existing code, as the current behavior
is for bprm->cap_* to be undefined when bprm_set_security is called. None
of this is strictly necessary for my patch, but it makes it a lot cleaner.

So, if the core changes were merged, my caps semantics could be maintained
as a (fairly simple) separate LSM. That prevents it working with SELinux
or other (non-stacking) LSMs loaded.

--Andy

Stephen Smalley

unread,
May 17, 2004, 8:10:06 AM5/17/04
to
On Mon, 2004-05-17 at 03:19, Andy Lutomirski wrote:
> So, if the core changes were merged, my caps semantics could be maintained
> as a (fairly simple) separate LSM. That prevents it working with SELinux
> or other (non-stacking) LSMs loaded.

SELinux supports stacking with the existing capability module (SELinux
registers first, then the capability module registers as a secondary
module under it).

--
Stephen Smalley <s...@epoch.ncsc.mil>
National Security Agency

-

Andy Lutomirski

unread,
May 18, 2004, 5:20:09 AM5/18/04
to

Chris Wright wrote:


> * Andy Lutomirski (lu...@myrealbox.com) wrote:
>
>>This version throws out the inheritable mask. There is no change in
>>behavior with newcaps=0.
>
>

> Alright, I tried to write up my expectations for all the various modes
> w.r.t dropping privs, keeping them, setuid apps, etc. I then wrote some
> tests to get a baseline, and well as a way to compare results. Finally
> I wrote a patch that meets the requirements I laid out, and compared it
> against yours. With one minor exception, the capabilities bits match
> up. You drop effective caps when a non-root users execs a non-root
> setuid app, and I the caps alone. ...

Paranoia. There are legacy setuid programs out there (presumably even
setuid-nonroot). Let's make them behave as closely to the way they
currently do as possible.


> ... One side note, you're changes effect
> the user/group saved ids unexpectedly.

Oops. That's a trivial fix.

>
> Here's a bunch of test cases:

> # Still w/out changing inheritable and with KEEPCAPS set
> # 10 Root process does setuid(!0), effective caps are dropped
> # 11 Root process does seteuid(!0), effective caps are dropped
> # 12 Nonroot process restores uid 0, effective restored to permitted

I still want some variant on KEEPCAPS that causes setxuid to leave caps
completely alone. Oh, well.

> # 18 Non-root execs setuid-nonroot process, new caps bound by inheritable

What new caps?

>>cap_2.6.6-mm2_4.patch: New stripped-back capabilities.
>>
>> fs/exec.c | 15 ++++-
>> include/linux/binfmts.h | 9 ++-
>> security/commoncap.c | 130 ++++++++++++++++++++++++++++++++++++++++++------
>> 3 files changed, 136 insertions(+), 18 deletions(-)
>>
>>--- linux-2.6.6-mm2/fs/exec.c~caps 2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/fs/exec.c 2004-05-14 12:36:17.000000000 -0700
>>@@ -882,8 +882,10 @@
>>
>> if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
>> /* Set-uid? */
>>- if (mode & S_ISUID)
>>+ if (mode & S_ISUID) {
>> bprm->e_uid = inode->i_uid;
>>+ bprm->secflags |= BINPRM_SEC_SETUID;
>>+ }
>
>

> No strong objection, but I don't think it's necessary.

I wanted to distinguish between setuid-me and non-setuid in apply_creds.
That one doesn't matter much, though.

>
>
>>
>> /* Set-gid? */
>> /*
>>@@ -891,10 +893,18 @@
>> * is a candidate for mandatory locking, not a setgid
>> * executable.
>> */
>>- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
>>+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
>> bprm->e_gid = inode->i_gid;
>>+ bprm->secflags |= BINPRM_SEC_SETGID;
>>+ }
>> }
>>
>>+ /* Pretend we have VFS capabilities */
>>+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
>>+ cap_set_full(bprm->cap_permitted);
>>+ else
>>+ cap_clear(bprm->cap_permitted);
>>+
>
>

> Thus far we've kept all this stuff out of core. I believe we should
> keep it that way. This could easily be left in bprm_set().

True. But as long as linux_binprm has fields for this stuff, intuitively
it should always be filled in (i.e. not just in commoncap). That way we
can say that commoncap doesn't get special treatment :)

Also, this seems like the right place to check for VFS caps. This way we can.

>
>
>>--- linux-2.6.6-mm2/security/commoncap.c~caps 2004-05-13 11:42:26.000000000 -0700
>>+++ linux-2.6.6-mm2/security/commoncap.c 2004-05-14 13:24:45.000000000 -0700
>>@@ -24,6 +24,11 @@
>> #include <linux/xattr.h>
>> #include <linux/hugetlb.h>
>>
>>+static int newcaps = 0;
>>+
>>+module_param(newcaps, int, 444);
>>+MODULE_PARM_DESC(newcaps, "Set newcaps=1 to enable experimental capabilities");
>
>

> It would be really nice to have a one size fits all solution. Esp.
> since the legacy mode is what one gets when leaving inheritable mask
> untouched.

I agree. Andrew specifically asked for this, though, at least until this
stuff is well-tested and everyone likes it.

>
>
>> int cap_bprm_set_security (struct linux_binprm *bprm)
>> {
>>+ if (newcaps)
>>+ return 0;
>>+
>
>

> That setup could go here instead of in core.
>
>

[snip]


>>
>>+/*
>>+ * The rules of Linux capabilities (not POSIX!)
>>+ *
>>+ * What the masks mean:
>>+ * pP = capabilities that this process has
>>+ * pE = capabilities that this process has and are enabled
>>+ * (so pE <= pP)
>>+ *
>>+ * The capability evolution rules are:
>>+ *
>>+ * pP' = ((fP & cap_bset) | pP) & Y
>>+ * pE' = (pE | fP) & pP'
>>+ *
>>+ * X = cap_bset
>>+ * Y is zero if uid!=0, euid==0, and setuid non-root
>>+ *
>>+ * Exception: if setuid-nonroot, then pE' is reset to 0.
>
>

> While this works fine, I was interested to see what we could do to
> leave the old algorithm. Seems both work out fine.


>
>
>>+ /* setuid-nonroot is special. */
>>+ if (is_setuid && bprm->e_uid != 0 && current->uid != 0 &&
>>+ current->euid == 0)
>>+ cap_clear(new_pP);
>
>

> setuid-nonroot from a non-root user needs to clear effective?

Yes. Say user 500 runs a setuid-root program, which goes back and runs a
setuid-500 program. uid==euid==500 now, so the program expects to be
unprivileged. This makes that work. (I'm assuming you meant permitted.
Effective gets cleared in cap_mask(new_pE, new_pP)).


The reason I killed the old algorithm is because it's scary (in the sense
of being complicated and subtle for no good reason) and because I don't see
the point of inheritable caps. I doubt anything uses them currently on a
vanilla kernel because they don't _do_ anything. So I killed them.


--Andy

Chris Wright

unread,
May 18, 2004, 9:30:12 PM5/18/04
to
* Andy Lutomirski (lu...@stanford.edu) wrote:
> Chris Wright wrote:
> > Alright, I tried to write up my expectations for all the various modes
> > w.r.t dropping privs, keeping them, setuid apps, etc. I then wrote some
> > tests to get a baseline, and well as a way to compare results. Finally
> > I wrote a patch that meets the requirements I laid out, and compared it
> > against yours. With one minor exception, the capabilities bits match
> > up. You drop effective caps when a non-root users execs a non-root
> > setuid app, and I the caps alone. ...
>
> Paranoia. There are legacy setuid programs out there (presumably even
> setuid-nonroot). Let's make them behave as closely to the way they
> currently do as possible.

Yes, that's the goal I have. Although, the above scenario, they've
already been limited (IOW, if nothing's been touched, all behaves the
same). Starting with limited inheritable (as say uid 500), execute
a non-root, setuid (say uid 99) program, is this expected to change
effective set? Currently it's a transition for 0 caps to 0 caps, not
very interesting. Given they are both unprivelged uids, I kept
the (limited) effective.

> > # Still w/out changing inheritable and with KEEPCAPS set
> > # 10 Root process does setuid(!0), effective caps are dropped
> > # 11 Root process does seteuid(!0), effective caps are dropped
> > # 12 Nonroot process restores uid 0, effective restored to permitted
>
> I still want some variant on KEEPCAPS that causes setxuid to leave caps
> completely alone. Oh, well.

Yeah, digs hole deeper though.

> > # 18 Non-root execs setuid-nonroot process, new caps bound by inheritable
>
> What new caps?

The caps in the newly exec'd process. IOW, effective aren't dropped,
but as you'd expect inheritable provides limit.

> >>+ /* Pretend we have VFS capabilities */
> >>+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
> >>+ cap_set_full(bprm->cap_permitted);
> >>+ else
> >>+ cap_clear(bprm->cap_permitted);
> >>+
> >
> >
> > Thus far we've kept all this stuff out of core. I believe we should
> > keep it that way. This could easily be left in bprm_set().
>
> True. But as long as linux_binprm has fields for this stuff, intuitively
> it should always be filled in (i.e. not just in commoncap). That way we
> can say that commoncap doesn't get special treatment :)
>
> Also, this seems like the right place to check for VFS caps. This way we can.

This does change the current notion of layering. I see your point though,
likening it to say reading inode and finding S_ISUID bit.

Yup, I see. This works in my patch as well. I'll add this test. Also
added test to check disabling priv escalation during ptrace of setuid
app.

> The reason I killed the old algorithm is because it's scary (in the sense
> of being complicated and subtle for no good reason) and because I don't see
> the point of inheritable caps. I doubt anything uses them currently on a
> vanilla kernel because they don't _do_ anything. So I killed them.

This does break all those caps aware apps (yeah, tongue-in-cheek ;-)
that actually have the idea to widen the effective set, yet limit the
inheriable set. Seriously, I don't know how much this matters.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Andy Lutomirski

unread,
May 18, 2004, 9:40:07 PM5/18/04
to
Chris Wright wrote:

> * Chris Wright (chr...@osdl.org) wrote:
>
>>* Andy Lutomirski (lu...@myrealbox.com) wrote:
>>

>>>This version throws out the inheritable mask. There is no change in
>>>behavior with newcaps=0.
>>

>>Alright, I tried to write up my expectations for all the various modes
>>w.r.t dropping privs, keeping them, setuid apps, etc. I then wrote some
>>tests to get a baseline, and well as a way to compare results. Finally
>>I wrote a patch that meets the requirements I laid out, and compared it
>>against yours. With one minor exception, the capabilities bits match
>>up. You drop effective caps when a non-root users execs a non-root

>>setuid app, and I the caps alone. One side note, you're changes effect


>>the user/group saved ids unexpectedly.

Disclaimer: I haven't had a chance to boot this version.

>
>
> The tests are available at:
> http://developer.osdl.org/~chrisw/capabilities/testcap.tar.bz2
>
> patch is there too and is also inline below:
> http://developer.osdl.org/~chrisw/capabilities/2.6.6-mm2/chris_cap.patch
>
> --- linux-2.6.6-mm2-cap/include/linux/capability.h.orig 2004-05-09 19:32:26.000000000 -0700
> +++ linux-2.6.6-mm2-cap/include/linux/capability.h 2004-05-17 23:24:08.143860096 -0700
> @@ -309,7 +309,9 @@
> #define CAP_EMPTY_SET to_cap_t(0)
> #define CAP_FULL_SET to_cap_t(~0)
> #define CAP_INIT_EFF_SET to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))
> -#define CAP_INIT_INH_SET to_cap_t(0)
> +#define CAP_INIT_INH_SET to_cap_t(~0)
> +/* ~0 is legacy inheritable mode and can never be capset by user */
> +#define cap_orig_inh(_cap) (cap_t((_cap)) == ~0)

So how do you say "inherit all caps"?

[...]

> @@ -56,10 +59,13 @@


> int cap_capset_check (struct task_struct *target, kernel_cap_t *effective,
> kernel_cap_t *inheritable, kernel_cap_t *permitted)
> {

> + kernel_cap_t target_inheritable = target->cap_inheritable;
> + if (cap_orig_inh(target_inheritable))
> + target_inheritable = 0;


> /* Derived from kernel/capability.c:sys_capset. */
> /* verify restrictions on target's new Inheritable set */

> if (!cap_issubset (*inheritable,
> - cap_combine (target->cap_inheritable,
> + cap_combine (target_inheritable,
> current->cap_permitted))) {
> return -EPERM;
> }

What stops legacy mode from being reenabled?

[...]

I think you missed the case when root-but-no-caps execs setuid root -- I
don't see anything that would enable secureexec.

--Andy

Andy Lutomirski

unread,
May 18, 2004, 10:00:08 PM5/18/04
to
Chris Wright wrote:

> * Andy Lutomirski (lu...@stanford.edu) wrote:
>
>>Chris Wright wrote:
>>
>>>
>>>
>>>Thus far we've kept all this stuff out of core. I believe we should
>>>keep it that way. This could easily be left in bprm_set().
>>
>>True. But as long as linux_binprm has fields for this stuff, intuitively
>>it should always be filled in (i.e. not just in commoncap). That way we
>>can say that commoncap doesn't get special treatment :)
>>
>>Also, this seems like the right place to check for VFS caps. This way we can.
>
>
> This does change the current notion of layering. I see your point though,
> likening it to say reading inode and finding S_ISUID bit.

On the other hand, no one would put reading of a SELinux security label
here. But we already have fields in binprm specifically for commoncap. I
have no strong preference.


>>The reason I killed the old algorithm is because it's scary (in the sense
>>of being complicated and subtle for no good reason) and because I don't see
>>the point of inheritable caps. I doubt anything uses them currently on a
>>vanilla kernel because they don't _do_ anything. So I killed them.
>
>
> This does break all those caps aware apps (yeah, tongue-in-cheek ;-)
> that actually have the idea to widen the effective set, yet limit the
> inheriable set. Seriously, I don't know how much this matters.

Yah, they're broken anyway right now if that's what they're doing.

The reason I didn't go for something like your approach (other than not
thinking of it) was that, as long as we're changing the semantics, we might
as well make them as clean as possible. I also didn't mind ripping out
lots of old code :). If the inheritable mask stays, then programs need to
be audited for what happens if they are run with different inheritable
masks. I'd rather just eliminate that complication and the corresponding
blob of commoncap code. Obviously my patch fails a lot of your tests as a
result.

So do we arm-wrestle over whose implementation wins? :) I'd say mine wins
on readability (not your fault -- the old code was pretty bad to begin
with) and some simplicity, but yours has the benefit of being less intrusive.

--Andy

Chris Wright

unread,
May 19, 2004, 3:30:17 AM5/19/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> Chris Wright wrote:
>
> > -#define CAP_INIT_INH_SET to_cap_t(0)
> > +#define CAP_INIT_INH_SET to_cap_t(~0)
> > +/* ~0 is legacy inheritable mode and can never be capset by user */
> > +#define cap_orig_inh(_cap) (cap_t((_cap)) == ~0)
>
> So how do you say "inherit all caps"?

Legacy mode requires effectively reserving a bit, which works in this case
since all 32 aren't used. So inherit all caps is all valid caps (which,
btw, still exludes CAP_SETPCAP). Something like "=eip cap_setpcap-eip"
would inherit all valid caps.

> > @@ -56,10 +59,13 @@
> > int cap_capset_check (struct task_struct *target, kernel_cap_t *effective,
> > kernel_cap_t *inheritable, kernel_cap_t *permitted)
> > {
> > + kernel_cap_t target_inheritable = target->cap_inheritable;
> > + if (cap_orig_inh(target_inheritable))
> > + target_inheritable = 0;
> > /* Derived from kernel/capability.c:sys_capset. */
> > /* verify restrictions on target's new Inheritable set */
> > if (!cap_issubset (*inheritable,
> > - cap_combine (target->cap_inheritable,
> > + cap_combine (target_inheritable,
> > current->cap_permitted))) {
> > return -EPERM;
> > }
>
> What stops legacy mode from being reenabled?

I believe only a kernel thread could do this (and init) since they are the
only ones that could get CAP_SETPCAP. Same threat as it is currently.

> I think you missed the case when root-but-no-caps execs setuid root -- I
> don't see anything that would enable secureexec.

Yup you're right. I liked how you did that in your patch and was just going
to steal that bit ;-)

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Chris Wright

unread,
May 19, 2004, 3:40:06 AM5/19/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> Chris Wright wrote:
> > This does change the current notion of layering. I see your point though,
> > likening it to say reading inode and finding S_ISUID bit.
>
> On the other hand, no one would put reading of a SELinux security label
> here. But we already have fields in binprm specifically for commoncap. I
> have no strong preference.

Yes, I stopped short of that argument only because capabilities fall
into a bit more of a gray zone than other modules. But I do prefer
leaving it in the module.

> >>The reason I killed the old algorithm is because it's scary (in the sense
> >>of being complicated and subtle for no good reason) and because I don't see
> >>the point of inheritable caps. I doubt anything uses them currently on a
> >>vanilla kernel because they don't _do_ anything. So I killed them.
> >
> > This does break all those caps aware apps (yeah, tongue-in-cheek ;-)
> > that actually have the idea to widen the effective set, yet limit the
> > inheriable set. Seriously, I don't know how much this matters.
>
> Yah, they're broken anyway right now if that's what they're doing.

On Linux they are. On IRIX they aren't. This is part of the issue as I
see it.

> The reason I didn't go for something like your approach (other than not
> thinking of it) was that, as long as we're changing the semantics, we might
> as well make them as clean as possible. I also didn't mind ripping out
> lots of old code :). If the inheritable mask stays, then programs need to
> be audited for what happens if they are run with different inheritable
> masks. I'd rather just eliminate that complication and the corresponding
> blob of commoncap code. Obviously my patch fails a lot of your tests as a
> result.

Actually the only glaring difference (aside from the uid/suid and non-root
execs nonroot-yet-diff-id-setuid-app issue I mentioned earlier) is in
something like "=ep cap_setpcap-ep cap_ipc_lock+i" IIRC.

I have the feeling we both are after the same thing, which is introducing
the ability to keep some caps through exec and still being able to sleep
at night w/ confidence that there isn't some subtle new hole lurking.
This is why I aimed to change as little code as possible.

> So do we arm-wrestle over whose implementation wins? :) I'd say mine wins
> on readability (not your fault -- the old code was pretty bad to begin
> with) and some simplicity, but yours has the benefit of being less intrusive.

Hehe, arm wrestling could be entertaining ;-) I'm in favor of the most
conservative change, which I feel is in my patch. But I'm game to
continue to pick on each.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Andy Lutomirski

unread,
May 23, 2004, 5:40:11 AM5/23/04
to
On Wednesday 19 May 2004 00:30, Chris Wright wrote:
> * Andy Lutomirski (lu...@myrealbox.com) wrote:
> > Chris Wright wrote:

> > The reason I didn't go for something like your approach (other than not
> > thinking of it) was that, as long as we're changing the semantics, we might
> > as well make them as clean as possible. I also didn't mind ripping out
> > lots of old code :). If the inheritable mask stays, then programs need to
> > be audited for what happens if they are run with different inheritable
> > masks. I'd rather just eliminate that complication and the corresponding
> > blob of commoncap code. Obviously my patch fails a lot of your tests as a
> > result.
>
> Actually the only glaring difference (aside from the uid/suid and non-root
> execs nonroot-yet-diff-id-setuid-app issue I mentioned earlier) is in
> something like "=ep cap_setpcap-ep cap_ipc_lock+i" IIRC.
>
> I have the feeling we both are after the same thing, which is introducing
> the ability to keep some caps through exec and still being able to sleep
> at night w/ confidence that there isn't some subtle new hole lurking.
> This is why I aimed to change as little code as possible.
>
> > So do we arm-wrestle over whose implementation wins? :) I'd say mine wins
> > on readability (not your fault -- the old code was pretty bad to begin
> > with) and some simplicity, but yours has the benefit of being less intrusive.
>
> Hehe, arm wrestling could be entertaining ;-) I'm in favor of the most
> conservative change, which I feel is in my patch. But I'm game to
> continue to pick on each.
>

You don't like my patch because it rips out a bunch of code and it's
not clear it won't break stuff.

I don't like your patch because it takes a bunch of incomprehensible
code that does almost nothing and tweaks it slightly to do something
useful. (That's not to say it's does the wrong thing; I just don't
think the code is clear.)

So I decided to figure out what was going on with the original code.

First, CAP_SETPCAP is never obtainable (by anything).
Since cap_bset never has this bit set, nothing can inherit it
from fP. capset_check prevents it from getting set in pI.

Second, cap_bset is broken. For one thing, there's no way
to remove the caps you want to restrict from already-running
tasks. So I don't think it matters if we break/change it.


cap_bprm_set_security does:
fP = fI = (new_uid == 0 || new_euid == 0)
fE = (new_euid == 0)

cap_bprm_apply_creds then does:
1. pP' = (fP & X) | (fI & pI)
2. unsafeness stuff
3. fix up uids
4. pE' = fE & pP'

Now for the fun part:

Since fP == fI, pP' = fP & (X | pI)
But X | pI == X (since pI can't have CAP_SETPCAP and X==~CAP_SETPCAP)
So pP' = fP & X

pE' == fE & pP' == fE & fP & X
But fE & fP == fE, so:
pE' = fE & X

The whole result is just
pP' = (uid == 0 || euid==0) & X
pE' = (euid == 0) & X


This patch implements this. It should be invisible to userspace
(unless userspace (ab)uses cap_bset). It also adds a secureexec
flag, which we both need.

First, did I get this right? It seems to work :)

Second, do you have any objection to both of us redoing our
patches against this one? It should make them nicer-looking
at least.

--Andy

--- 2.6.6-mm4/fs/exec.c~cap_cleanup 2004-05-19 10:37:25.000000000 -0700
+++ 2.6.6-mm4/fs/exec.c 2004-05-21 19:03:16.000000000 -0700
@@ -1093,6 +1093,7 @@


bprm.loader = 0;
bprm.exec = 0;
bprm.security = NULL;
+ bprm.secflags = 0;
bprm.mm = mm_alloc();
retval = -ENOMEM;
if (!bprm.mm)

--- 2.6.6-mm4/security/commoncap.c~cap_cleanup 2004-05-21 18:51:42.000000000 -0700
+++ 2.6.6-mm4/security/commoncap.c 2004-05-23 01:42:51.000000000 -0700
@@ -89,44 +89,20 @@



int cap_bprm_set_security (struct linux_binprm *bprm)
{

- /* Copied from fs/exec.c:prepare_binprm. */
-
- /* We don't have VFS support for capabilities yet */
- cap_clear (bprm->cap_inheritable);
- cap_clear (bprm->cap_permitted);
- cap_clear (bprm->cap_effective);
-
- /* To support inheritance of root-permissions and suid-root
- * executables under compatibility mode, we raise all three
- * capability sets for the file.
- *
- * If only the real uid is 0, we only raise the inheritable
- * and permitted sets of the executable file.
- */
-
- if (!issecure (SECURE_NOROOT)) {
- if (bprm->e_uid == 0 || current->uid == 0) {
- cap_set_full (bprm->cap_inheritable);
- cap_set_full (bprm->cap_permitted);
- }
- if (bprm->e_uid == 0)
- cap_set_full (bprm->cap_effective);
- }
return 0;


}

void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)

{
- /* Derived from fs/exec.c:compute_creds. */

- kernel_cap_t new_permitted, working;
+ kernel_cap_t new_pP, new_pE;

- new_permitted = cap_intersect (bprm->cap_permitted, cap_bset);
- working = cap_intersect (bprm->cap_inheritable,
- current->cap_inheritable);
- new_permitted = cap_combine (new_permitted, working);
+ if (bprm->e_uid == 0 || current->uid == 0)
+ new_pP = cap_bset;
+ else
+ cap_clear(new_pP);

if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
- !cap_issubset (new_permitted, current->cap_permitted)) {
+ !cap_issubset (new_pP, current->cap_permitted)) {
current->mm->dumpable = 0;

if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
@@ -134,13 +110,16 @@
bprm->e_uid = current->uid;
bprm->e_gid = current->gid;
}
- if (!capable (CAP_SETPCAP)) {
- new_permitted = cap_intersect (new_permitted,
- current->cap_permitted);
- }
+ new_pP = cap_intersect (new_pP,
+ current->cap_permitted);
}
}

+ if (bprm->e_uid == 0)
+ new_pE = new_pP;
+ else
+ cap_clear(new_pE);


+
current->suid = current->euid = current->fsuid = bprm->e_uid;

current->sgid = current->egid = current->fsgid = bprm->e_gid;

@@ -148,9 +127,8 @@
* in the init_task struct. Thus we skip the usual
* capability rules */
if (current->pid != 1) {
- current->cap_permitted = new_permitted;
- current->cap_effective =
- cap_intersect (new_permitted, bprm->cap_effective);


+ current->cap_permitted = new_pP;
+ current->cap_effective = new_pE;
}

/* AUD: Audit candidate if current->cap_effective is set */
@@ -160,13 +138,9 @@



int cap_bprm_secureexec (struct linux_binprm *bprm)
{
- /* If/when this module is enhanced to incorporate capability
- bits on files, the test below should be extended to also perform a
- test between the old and new capability sets. For now,
- it simply preserves the legacy decision algorithm used by
- the old userland. */
return (current->euid != current->uid ||
- current->egid != current->gid);
+ current->egid != current->gid ||
+ (bprm->secflags & BINPRM_SEC_SECUREEXEC));
}

int cap_inode_setxattr(struct dentry *dentry, char *name, void *value,

--- 2.6.6-mm4/include/linux/binfmts.h~cap_cleanup 2004-05-19 10:37:25.000000000 -0700
+++ 2.6.6-mm4/include/linux/binfmts.h 2004-05-23 02:24:31.034876220 -0700
@@ -20,6 +20,7 @@


/*
* This structure is used to hold the arguments that are used when loading binaries.
*/

+#define BINPRM_SEC_SECUREEXEC 1


struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
struct page *page[MAX_ARG_PAGES];

@@ -28,6 +29,7 @@


int sh_bang;
struct file * file;
int e_uid, e_gid;

+ int secflags;
kernel_cap_t cap_inheritable, cap_permitted, cap_effective;


void *security;
int argc, envc;

Olaf Dietsche

unread,
May 23, 2004, 3:00:09 PM5/23/04
to
Andy Lutomirski <lu...@myrealbox.com> writes:

> You don't like my patch because it rips out a bunch of code and it's
> not clear it won't break stuff.
>
> I don't like your patch because it takes a bunch of incomprehensible
> code that does almost nothing and tweaks it slightly to do something
> useful. (That's not to say it's does the wrong thing; I just don't
> think the code is clear.)
>
> So I decided to figure out what was going on with the original code.
>
> First, CAP_SETPCAP is never obtainable (by anything).
> Since cap_bset never has this bit set, nothing can inherit it
> from fP. capset_check prevents it from getting set in pI.

# mv /sbin/init /sbin/init.bin
# cat >/sbin/init
#! /bin/sh

if test $$ -eq 1; then
mount /proc
echo -1 >/proc/sys/kernel/cap-bound
fi

exec /sbin/init.bin "$@"
^D
# chmod 755 /sbin/init
# reboot

> Second, cap_bset is broken. For one thing, there's no way
> to remove the caps you want to restrict from already-running
> tasks. So I don't think it matters if we break/change it.

Maybe I don't understand this, but I think this is what sys_capset()
is for.

> cap_bprm_set_security does:
> fP = fI = (new_uid == 0 || new_euid == 0)
> fE = (new_euid == 0)

Only if (!issecure (SECURE_NOROOT))

[...]


> The whole result is just
> pP' = (uid == 0 || euid==0) & X
> pE' = (euid == 0) & X
>
> This patch implements this. It should be invisible to userspace
> (unless userspace (ab)uses cap_bset). It also adds a secureexec
> flag, which we both need.
>
> First, did I get this right? It seems to work :)

With this patch you effectively revert all capable() calls back to
suser() tests. If this is what you intended, your patch looks fine.

> Second, do you have any objection to both of us redoing our
> patches against this one? It should make them nicer-looking
> at least.

I didn't scrutinize capabilities as thoroughly as you did, but I still
don't see why your patch is necessary, besides the changes in
fs/exec.c and include/binfmts.h, maybe.

$ cp commoncap.c lutocap.c
modify it to your liking
# insmod lutocap

same goes for chriscap.c

Please, don't get me wrong. For me, it's just a matter of maintaining
a slightly bigger fscaps patch. But I don't think capabilities in
Linux are really broken, only because some proponents of SELinux claim
so.

If you want a simpler - setuid like - capabilities model, throw out
the inheritable _and_ permitted set and use the effective set alone.

Regards, Olaf.

Andy Lutomirski

unread,
May 24, 2004, 7:50:05 PM5/24/04
to
>
> Hehe, arm wrestling could be entertaining ;-) I'm in favor of the most
> conservative change, which I feel is in my patch. But I'm game to
> continue to pick on each.


I like your legacy mode. I don't like making processes inherit
non-legacyness. (With your patch, some daemon might be secure
when started from initscripts but insecure when started from the
command line, if root ended up in non-legacy mode.)

You've also convinced me that an inheritable mask is good, because
it may make some IRIX apps work. It's also necessary for this patch
to be safe.

I don't like touching the inode in the security module (you
forgot to check nosuid, for one thing).

So here's another shot at it:

"Legacy mode" is controlled by a new bit in task_struct called
keep_all_caps (controlled by PRCTL_SET_KEEPALLCAPS). This bit turns
off setuid emulation completely (except for setfsuid).

The evolution rules are:
pP' = (fP & X) | (pI & pP) [with the setuid-nonroot fix]


pE' = (pE | fP) & pP'

pI' = full

This time around, I haven't touched the unsafeness rules.

The magic is in the setuid emulation:
if (current->uid == 0 || current->euid == 0)
cap_set_full(current->cap_inheritable);
else
cap_clear(current->cap_inheritable);

So, unless a program plays with it's inheritable mask,
root will not pick up caps on exec (which is good -- it
means it's safe to chroot somewhere, disable all caps
except CAP_SETUID, and let untrusted code play around.)
But, if you start as root and setuid away, _even with
keepcaps_, you lose the caps on exec. Which is the broken
behavior we want to preserve.

So, to avoid this, new code can either set keep_all_caps
or just explicitly enable inheritance after setuid, in
which case it just works.

I have pI' = full because otherwise it's just one more
(partially) user-controlled variable that programs need
to worry about. (And because anything else would break
root.)

As for the rest of the changes:

The code no longer assumes that pI<pP, so I yanked all checks
on the inheritable mask. On the other hand, it makes no
sense to me for capset when changing lots of processes'
masks to affect the inheritable mask. So I made it leave
it alone, except when changing current.

keep_all_caps is clearly not entirely necessary. I can take
it out if anyone objects.

I yanked all capset sanity checks from kernel/capability.c --
they were duplicates anyway.

And I left the old (IMHO pointless) behavior that one needs to hack
init in order to use CAP_SETPCAP.

[Side note: for cap_bset to be useful, I think there needs to be
an operation "atomically remove these caps from all tasks." I
don't see one.]

This patch also should work fine if VFS capabilities are
introduced (there's an fP mask which defaults to (setuid-
root ? full : 0).

Patch against 2.6.6-mm4 (-mm5 didn't like my filesystem...).
It's not as well tested as it should be. The old cap.cc
tool still works (but remember to set inheritable). I
don't have a tool yet to play with keep_all_caps.

This depends on my cleanup patch from Saturday.

--Andy

Signed-off-by: Andy Lutomirski (lu...@myrealbox.com)

fs/exec.c | 8 +++++-
include/linux/binfmts.h | 2 +
include/linux/capability.h | 2 -
include/linux/init_task.h | 1
include/linux/prctl.h | 4 +++
include/linux/sched.h | 1
kernel/capability.c | 13 ----------
kernel/sys.c | 11 +++++++++
security/commoncap.c | 54 +++++++++++++++++++++++++++++++--------------
9 files changed, 64 insertions(+), 32 deletions(-)

--- 2.6.6-mm4/fs/exec.c~cap_2 2004-05-23 22:22:42.000000000 -0700
+++ 2.6.6-mm4/fs/exec.c 2004-05-23 22:43:22.000000000 -0700
@@ -886,8 +886,10 @@



if(!(bprm->file->f_vfsmnt->mnt_flags & MNT_NOSUID)) {
/* Set-uid? */
- if (mode & S_ISUID)
+ if (mode & S_ISUID) {
bprm->e_uid = inode->i_uid;
+ bprm->secflags |= BINPRM_SEC_SETUID;
+ }

/* Set-gid? */
/*
@@ -895,8 +897,10 @@


* is a candidate for mandatory locking, not a setgid
* executable.
*/
- if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP))
+ if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
bprm->e_gid = inode->i_gid;
+ bprm->secflags |= BINPRM_SEC_SETGID;
+ }
}

/* fill in binprm security blob */

--- 2.6.6-mm4/security/commoncap.c~cap_2 2004-05-23 22:15:39.000000000 -0700
+++ 2.6.6-mm4/security/commoncap.c 2004-05-24 14:32:43.210834050 -0700
@@ -57,13 +57,6 @@
kernel_cap_t *inheritable, kernel_cap_t *permitted)
{


/* Derived from kernel/capability.c:sys_capset. */

- /* verify restrictions on target's new Inheritable set */
- if (!cap_issubset (*inheritable,
- cap_combine (target->cap_inheritable,
- current->cap_permitted))) {
- return -EPERM;
- }
-
/* verify restrictions on target's new Permitted set */
if (!cap_issubset (*permitted,
cap_combine (target->cap_permitted,
@@ -83,12 +76,19 @@


kernel_cap_t *inheritable, kernel_cap_t *permitted)
{
target->cap_effective = *effective;
- target->cap_inheritable = *inheritable;
target->cap_permitted = *permitted;

+ if (current == target)


+ target->cap_inheritable = *inheritable;
}

int cap_bprm_set_security (struct linux_binprm *bprm)
{

+ /* Pretend we have VFS capabilities */
+ if ((bprm->secflags & BINPRM_SEC_SETUID) && bprm->e_uid == 0)
+ cap_set_full(bprm->cap_permitted);
+ else
+ cap_clear(bprm->cap_permitted);
+

return 0;
}

@@ -96,10 +96,9 @@
{
kernel_cap_t new_pP, new_pE;



- if (bprm->e_uid == 0 || current->uid == 0)

- new_pP = cap_bset;
- else
- cap_clear(new_pP);
+ new_pP = cap_combine(cap_intersect(bprm->cap_permitted, cap_bset),
+ cap_intersect(current->cap_permitted,
+ current->cap_inheritable));



if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||

!cap_issubset (new_pP, current->cap_permitted)) {
@@ -115,10 +114,15 @@


}
}

- if (bprm->e_uid == 0)

- new_pE = new_pP;
- else
- cap_clear(new_pE);


+ /* setuid-nonroot is special. */

+ if (bprm->e_uid != 0 && current->uid != 0 && current->euid == 0)
+ cap_clear(new_pP);
+
+ new_pE = cap_combine(current->cap_effective, bprm->cap_permitted);
+ cap_mask(new_pE, new_pP);


+
+ if (!cap_issubset(new_pP, current->cap_permitted))
+ bprm->secflags |= BINPRM_SEC_SECUREEXEC;

current->suid = current->euid = current->fsuid = bprm->e_uid;
current->sgid = current->egid = current->fsgid = bprm->e_gid;

@@ -129,11 +133,13 @@


if (current->pid != 1) {

current->cap_permitted = new_pP;
current->cap_effective = new_pE;
+ cap_set_full(current->cap_inheritable);


}

/* AUD: Audit candidate if current->cap_effective is set */

current->keep_capabilities = 0;
+ current->keep_all_caps = 0;


}

int cap_bprm_secureexec (struct linux_binprm *bprm)

@@ -191,10 +197,18 @@
* Keeping uid 0 is not an option because uid 0 owns too many vital
* files..
* Thanks to Olaf Kirch and Peter Benie for spotting this.
+ *
+ * luto - New behavior, May '04
+ * To emulate old evolution rules, inheritable tracks uid and euid.
+ * prctl(PRCTL_SET_KEEPALLCAPS) disables this. In fact, keepallcaps
+ * disables this whole function. BE CAREFUL ABOUT THE EFFECTIVE MASK!!!
*/
static inline void cap_emulate_setxuid (int old_ruid, int old_euid,
int old_suid)
{
+ if (current->keep_all_caps)
+ return;
+
if ((old_ruid == 0 || old_euid == 0 || old_suid == 0) &&
(current->uid != 0 && current->euid != 0 && current->suid != 0) &&
!current->keep_capabilities) {
@@ -207,6 +221,11 @@
if (old_euid != 0 && current->euid == 0) {
current->cap_effective = current->cap_permitted;
}
+
+ if (current->uid == 0 || current->euid == 0)
+ cap_set_full(current->cap_inheritable);
+ else
+ cap_clear(current->cap_inheritable);
}

int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid,
@@ -230,6 +249,9 @@
/*
* FIXME - is fsuser used for all CAP_FS_MASK capabilities?
* if not, we might be a bit too harsh here.
+ *
+ * This explicitly ignores keep_all_caps. The
+ * potential for error is just too large.
*/

if (!issecure (SECURE_NO_SETUID_FIXUP)) {
--- 2.6.6-mm4/kernel/sys.c~cap_2 2004-05-23 22:15:24.000000000 -0700
+++ 2.6.6-mm4/kernel/sys.c 2004-05-23 22:57:40.000000000 -0700
@@ -1646,6 +1646,17 @@
}
current->keep_capabilities = arg2;
break;
+ case PR_GET_KEEPALLCAPS:
+ if (current->keep_all_caps)
+ error = 1;
+ break;
+ case PR_SET_KEEPALLCAPS:
+ if (arg2 != 0 && arg2 != 1) {
+ error = -EINVAL;
+ break;
+ }
+ current->keep_all_caps = arg2;
+ break;
default:
error = -EINVAL;
break;
--- 2.6.6-mm4/kernel/capability.c~cap_2 2004-05-23 23:11:49.000000000 -0700
+++ 2.6.6-mm4/kernel/capability.c 2004-05-23 23:13:14.000000000 -0700
@@ -174,19 +174,6 @@
if (security_capset_check(target, &effective, &inheritable, &permitted))
goto out;

- if (!cap_issubset(inheritable, cap_combine(target->cap_inheritable,
- current->cap_permitted)))
- goto out;
-
- /* verify restrictions on target's new Permitted set */
- if (!cap_issubset(permitted, cap_combine(target->cap_permitted,
- current->cap_permitted)))
- goto out;
-
- /* verify the _new_Effective_ is a subset of the _new_Permitted_ */
- if (!cap_issubset(effective, permitted))
- goto out;
-
ret = 0;

/* having verified that the proposed changes are legal,
--- 2.6.6-mm4/include/linux/capability.h~cap_2 2004-05-23 22:42:22.000000000 -0700
+++ 2.6.6-mm4/include/linux/capability.h 2004-05-23 22:42:34.000000000 -0700
@@ -309,7 +309,7 @@


#define CAP_EMPTY_SET to_cap_t(0)
#define CAP_FULL_SET to_cap_t(~0)
#define CAP_INIT_EFF_SET to_cap_t(~0 & ~CAP_TO_MASK(CAP_SETPCAP))

-#define CAP_INIT_INH_SET to_cap_t(0)
+#define CAP_INIT_INH_SET CAP_FULL_SET

#define CAP_TO_MASK(x) (1 << (x))
#define cap_raise(c, flag) (cap_t(c) |= CAP_TO_MASK(flag))
--- 2.6.6-mm4/include/linux/sched.h~cap_2 2004-05-23 22:04:50.000000000 -0700
+++ 2.6.6-mm4/include/linux/sched.h 2004-05-23 22:11:15.000000000 -0700
@@ -462,6 +462,7 @@
struct group_info *group_info;
kernel_cap_t cap_effective, cap_inheritable, cap_permitted;
int keep_capabilities:1;
+ int keep_all_caps:1;
struct user_struct *user;
/* limits */
struct rlimit rlim[RLIM_NLIMITS];
--- 2.6.6-mm4/include/linux/prctl.h~cap_2 2004-05-23 22:12:18.000000000 -0700
+++ 2.6.6-mm4/include/linux/prctl.h 2004-05-23 22:13:35.000000000 -0700
@@ -20,6 +20,10 @@
#define PR_GET_KEEPCAPS 7
#define PR_SET_KEEPCAPS 8

+/* Get/set whether to emulate old capability behavior */
+#define PR_GET_KEEPALLCAPS 15
+#define PR_SET_KEEPALLCAPS 16
+
/* Get/set floating-point emulation control bits (if meaningful) */
#define PR_GET_FPEMU 9
#define PR_SET_FPEMU 10
--- 2.6.6-mm4/include/linux/binfmts.h~cap_2 2004-05-23 22:27:43.000000000 -0700
+++ 2.6.6-mm4/include/linux/binfmts.h 2004-05-23 22:27:58.000000000 -0700
@@ -21,6 +21,8 @@


* This structure is used to hold the arguments that are used when loading binaries.
*/

#define BINPRM_SEC_SECUREEXEC 1
+#define BINPRM_SEC_SETUID 2
+#define BINPRM_SEC_SETGID 4


struct linux_binprm{
char buf[BINPRM_BUF_SIZE];
struct page *page[MAX_ARG_PAGES];

--- 2.6.6-mm4/include/linux/init_task.h~cap_2 2004-05-23 22:11:27.000000000 -0700
+++ 2.6.6-mm4/include/linux/init_task.h 2004-05-23 22:11:44.000000000 -0700
@@ -96,6 +96,7 @@
.cap_inheritable = CAP_INIT_INH_SET, \
.cap_permitted = CAP_FULL_SET, \
.keep_capabilities = 0, \
+ .keep_all_caps = 0, \
.rlim = INIT_RLIMITS, \
.user = INIT_USER, \
.comm = "swapper", \

Chris Wright

unread,
May 24, 2004, 8:00:11 PM5/24/04
to
* Andy Lutomirski (lu...@myrealbox.com) wrote:
> >
> > Hehe, arm wrestling could be entertaining ;-) I'm in favor of the most
> > conservative change, which I feel is in my patch. But I'm game to
> > continue to pick on each.
>
>
> I like your legacy mode. I don't like making processes inherit
> non-legacyness. (With your patch, some daemon might be secure
> when started from initscripts but insecure when started from the
> command line, if root ended up in non-legacy mode.)

Hmm, that was intentional (my very first cut at this thing cleared it,
but that patch had many other broken behaviours). Specifically because
it goes through pI, which POSIX draft says is untouched through exec.

> You've also convinced me that an inheritable mask is good, because
> it may make some IRIX apps work. It's also necessary for this patch
> to be safe.
>
> I don't like touching the inode in the security module (you
> forgot to check nosuid, for one thing).

Yup, I changed that since then, using the secflags approach you did.

> So here's another shot at it:
>
> "Legacy mode" is controlled by a new bit in task_struct called
> keep_all_caps (controlled by PRCTL_SET_KEEPALLCAPS). This bit turns
> off setuid emulation completely (except for setfsuid).

I had same idea. I wished we could hijack keep_capabilities as a
bit vector.

> The evolution rules are:
> pP' = (fP & X) | (pI & pP) [with the setuid-nonroot fix]
> pE' = (pE | fP) & pP'
> pI' = full
>
> This time around, I haven't touched the unsafeness rules.
>
> The magic is in the setuid emulation:
> if (current->uid == 0 || current->euid == 0)
> cap_set_full(current->cap_inheritable);
> else
> cap_clear(current->cap_inheritable);
>
> So, unless a program plays with it's inheritable mask,
> root will not pick up caps on exec (which is good -- it
> means it's safe to chroot somewhere, disable all caps
> except CAP_SETUID, and let untrusted code play around.)
> But, if you start as root and setuid away, _even with
> keepcaps_, you lose the caps on exec. Which is the broken
> behavior we want to preserve.
>
> So, to avoid this, new code can either set keep_all_caps
> or just explicitly enable inheritance after setuid, in
> which case it just works.
>
> I have pI' = full because otherwise it's just one more
> (partially) user-controlled variable that programs need
> to worry about. (And because anything else would break
> root.)

How do you keep passing down the same caps through multiple execs?

> As for the rest of the changes:
>
> The code no longer assumes that pI<pP, so I yanked all checks
> on the inheritable mask. On the other hand, it makes no
> sense to me for capset when changing lots of processes'
> masks to affect the inheritable mask. So I made it leave
> it alone, except when changing current.
>
> keep_all_caps is clearly not entirely necessary. I can take
> it out if anyone objects.
>
> I yanked all capset sanity checks from kernel/capability.c --
> they were duplicates anyway.
>
> And I left the old (IMHO pointless) behavior that one needs to hack
> init in order to use CAP_SETPCAP.
>
> [Side note: for cap_bset to be useful, I think there needs to be
> an operation "atomically remove these caps from all tasks." I
> don't see one.]

Yeah. It depends on the definition of useful. Get a couple privileged
tasks running (which may fork/exec from time to time), then clamp down
the machine is one form of useful. In general, I don't cap_bset is that
useful though.

> This patch also should work fine if VFS capabilities are
> introduced (there's an fP mask which defaults to (setuid-
> root ? full : 0).
>
> Patch against 2.6.6-mm4 (-mm5 didn't like my filesystem...).
> It's not as well tested as it should be. The old cap.cc
> tool still works (but remember to set inheritable). I
> don't have a tool yet to play with keep_all_caps.

I can add this to the test stuff to play with it.

thanks,
-chris
--
Linux Security Modules http://lsm.immunix.org http://lsm.bkbits.net

Andy Lutomirski

unread,
May 24, 2004, 8:30:10 PM5/24/04
to
Chris Wright wrote:

> * Andy Lutomirski (lu...@myrealbox.com) wrote:
>
>>>Hehe, arm wrestling could be entertaining ;-) I'm in favor of the most
>>>conservative change, which I feel is in my patch. But I'm game to
>>>continue to pick on each.
>>
>>
>>I like your legacy mode. I don't like making processes inherit
>>non-legacyness. (With your patch, some daemon might be secure
>>when started from initscripts but insecure when started from the
>>command line, if root ended up in non-legacy mode.)
>
>
> Hmm, that was intentional (my very first cut at this thing cleared it,
> but that patch had many other broken behaviours). Specifically because
> it goes through pI, which POSIX draft says is untouched through exec.

Not in IRIX, though. And I'm afraid of:

cap -c all-i <some setuid binary>
versus
cap -c all+i <some setuid binary>

Suddently the binary's behavior might be different. This isn't
inheritantly bad, but it seems like a pointless gotcha.

I like my version of using inheritable for legaciness, but only because my
inheritable semantics make sense. Your version would worry me a lot less
if you just added a new field. But mine doesn't actually need the new field ;)

>>
>>"Legacy mode" is controlled by a new bit in task_struct called
>>keep_all_caps (controlled by PRCTL_SET_KEEPALLCAPS). This bit turns
>>off setuid emulation completely (except for setfsuid).
>
>
> I had same idea. I wished we could hijack keep_capabilities as a
> bit vector.

It's a bitfield. Just add fields -- no cost in memory. Fairly large cost
in compile time, though...

This only takes effect when set*uid is called successfully. It bites
programs that start as non-root with CAP_SETUID and change their uid, but
these programs either don't exist or don't work at all right now.

[root@luto andy]# cap -c all+i -u andy bash
[andy@luto andy]$ dumpcap [note second exec]
Real Eff
User 500 500
Group 500 500

Caps: =ip cap_setpcap-p

>
>
>>As for the rest of the changes:
>>
>>The code no longer assumes that pI<pP, so I yanked all checks
>>on the inheritable mask. On the other hand, it makes no
>>sense to me for capset when changing lots of processes'
>>masks to affect the inheritable mask. So I made it leave
>>it alone, except when changing current.
>>
>>keep_all_caps is clearly not entirely necessary. I can take
>>it out if anyone objects.
>>
>>I yanked all capset sanity checks from kernel/capability.c --
>>they were duplicates anyway.
>>
>>And I left the old (IMHO pointless) behavior that one needs to hack
>>init in order to use CAP_SETPCAP.
>>
>>[Side note: for cap_bset to be useful, I think there needs to be
>>an operation "atomically remove these caps from all tasks." I
>>don't see one.]
>
>
> Yeah. It depends on the definition of useful. Get a couple privileged
> tasks running (which may fork/exec from time to time), then clamp down
> the machine is one form of useful. In general, I don't cap_bset is that
> useful though.

Especially with CAP_SYS_ADMIN... SELinux is clearly the way to go here.

I just discovered a patch
(http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.4-fcap/README)
that claims to implement per-process-tree maximum cap masks (like I did for
awhile). It hasn't been maintained, though.

If one of our patches hits -mm or -linus, I may try and add a feature like
that. It'll (rightly) annoy the SELinux folks, though.

>
>
>>This patch also should work fine if VFS capabilities are
>>introduced (there's an fP mask which defaults to (setuid-
>>root ? full : 0).
>>
>>Patch against 2.6.6-mm4 (-mm5 didn't like my filesystem...).
>>It's not as well tested as it should be. The old cap.cc
>>tool still works (but remember to set inheritable). I
>>don't have a tool yet to play with keep_all_caps.
>
>
> I can add this to the test stuff to play with it.

Except that I fail a lot of your tests because of inheritable mask
differences. Oh, well.

I may revive my ext3 caps patch sometime. Is there a way to make that work
with your patch?

--Andy

0 new messages